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Preface to Second Edition 


In the second edition we have significantly expanded the chapter on stochastic 
integration in order to give an introduction to modern mathematical finance. 
We have expanded the discussion of It6’s formula, introduced the Girsanov 
transformation and the Feynman-Kac formula, and derived the Black-Scholes 
formula for pricing options. We have tried to present this material in the 
same styles as other topics, that is, without complete mathematical details, 
but with enough ideas to explain to the reader why formulas are true. 

We have added a section on maximal inequalities to the martingale sec- 
tion and included more material on Brownian motion. We have included a 
few more examples throughout the book and have increased the number of 
exercises at the end of the chapters. We have also made corrections and mi- 
nor revisions in many places and included some recommendations for further 
reading. 


Preface to First Edition 


This book is an outgrowth of lectures in Mathematics 240, “Applied Stochas- 
tic Processes,” which I have taught a number of times at Duke University. 
The majority of the students in the course are graduate students from de- 
partments other than mathematics, including computer science, economics, 
business, biological sciences, psychology, physics, statistics, and engineering. 
There have also been graduate students from the mathematics department as 
well as some advanced undergraduates. The mathematical background of the 
students varies greatly, and the particular areas of stochastic processes that 
are relevant for their research also vary greatly. 


The prerequisites for using this book are a good calculus-based undergrad- 
uate course in probability and a course in linear algebra including eigenvalues 
and eigenvectors. I also assume that the reader is reasonably computer liter- 
ate. The exercises assume that the reader can write simple programs and has 
access to some software for linear algebra computations. In all of my classes, 
students have had sufficient computer experience to accomplish this. Most 
of the students have also had some exposure to differential equations and I 
use such ideas freely, although I have a short section on linear differential 
equations in the preliminary chapter. 


I have tried to discuss key mathematical ideas in this book, but I have not 
made an attempt to put in all the mathematical details. Measure theory is not 
a prerequisite but I have tried to present topics is a way such that readers who 
have some knowledge of measure theory can fill in details. Although this is a 
book intended primarily for people with applications in mind, there are few 
real applications discussed. ‘True applications require a good understanding 
of the field being studied and it is not a goal of this book to discuss the many 
different fields in which stochastic processes are used. I have instead chosen 
to stick with the very basic examples and let the experts in other fields decide 
when certain mathematical assumptions are appropriate for their application. 


Chapter 1 covers the standard material on finite Markov chains. I have 
not given proofs of the convergence to equilibrium but rather have empha- 
sized the relationship between the convergence to equilibrium and the size of 
the eigenvalues of the stochastic matrix. Chapter 2 deals with infinite state 
space. The notions of transience, null recurrence, and positive recurrence 
are introduced, using as the main example, a random walk on the nonnega- 
tive integers with reflecting boundary. The chapter ends with a discussion of 
branching processes. 


Continuous-time Markov chains are discussed in Chapter 3. The discussion 
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centers on three main types: Poisson process, finite state space, and birth- 
and-death processes. For these processes I have used the forward differential 
equations to describe the evolution of the probabilities. This is easier and more 
natural than the backward equations. Unfortunately, the forward equations 
are not a legitimate means to analyze all continuous-time Markov chains and 
this fact is discussed briefly in the last section. One of the main examples of 
a birth-and-death process is a Markovian queue. 

I have included Chapter 4 on optimal stopping of Markov chains as one 
example in the large area of decision theory. Optimal stopping has a nice 
combination of theoretical mathematics leading to an algorithm to solve a 
problem. The basic ideas are also similar to ideas presented in Chapter 5. 

The idea of a martingale is fundamental in much of stochastic processes, 
and the goal of Chapter 5 is to give a solid introduction to these ideas. The 
modern definition of conditional expectation is first discussed and the idea 
of “measurable with respect to F,, the information available at time n” is 
used freely without worrying about giving it a rigorous meaning in terms 
of g-algebras. The major theorems of the area, optional sampling and the 
martingale convergence theorem, are discussed as well as their proofs. Proofs 
are important here since part of the theory is to understand why the theorems 
do not always hold. I have included a discussion of uniform integrability. 

The basic ideas of renewal theory are discussed in Chapter 6. For nonlattice 
random variables the renewal equation is used as the main tool of analysis 
while for lattice random variables a Markov chain approach is used. As an 
application, queues with general service times are analyzed. 

Chapter 7 discusses a couple of current topics in the realm of reversible 
Markov chains. First a more mathematical discussion about the rate of con- 
vergence to equilibrium is given, followed by a short introduction to the idea 
of Markov chain algorithms which are becoming very important in some areas 
of physics, computer science, and statistics. The final section on recurrence is 
a nice use of “variational” ideas to prove a result that is hard to prove directly. 

Chapter 8 gives a very quick introduction to a large number of ideas in 
Brownian motion. It is impossible to make any attempt to put in all the math- 
ematical details. I have discussed multidimensional as well as one-dimensional 
Brownian motion and have tried to show why Brownian motion and the heat 
equation are basically the same subject. I have also tried to discuss a little 
of the fractal nature of some of the sets produced by Brownian motion. In 
Chapter 9, a very short introduction to the idea of stochastic integration is 
given. This also is a very informal discussion but is intended to allow the 
students to at least have some ideas of what a stochastic integral is. 

This book has a little more than can be covered in a one semester course. 
In my view the basic course consists of Chapters 1, 2, 3, 5, and 8. Which 
of the remaining chapters I cover depends on the particular students in the 
class that semester. The basic chapters should probably be done in the order 
listed, but the other chapters can be done at any time. Chapters 4 and 7 use 
the previous material on Markov chains; Chapter 6 uses Markov chains and 
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martingales in the last section; and Chapter 9 uses the definition of Brownian 
motion as well as martingales. 

I would like to thank the students in Math 240 in Spring 1992 and Spring 
1994 for their comments and corrections on early versions of these notes. I 
also thank Rick Clelland, who was my assistant when I was preparing the first 
version in 1992, and the reviewers, Michael Phelan and Daniel C. Wiener, for 
their suggestions. During the writing of this book, I was partially supported 
by the National Science Foundation. 


Chapter O 


Preliminaries 


0.1 Introduction 


A stochastic process is a random process evolving with time. More precisely, 
a stochastic process is a collection of random variables X; indexed by time. 
In this book, time will always be either a subset of the nonnegative integers 
{0,1,2,...} or a subset of [0, 00), the nonnegative real numbers. In the first 
case we will call the process discrete time, and in the second case continuous 
time. The random variables X; will take values in a set that we call the state 
space. We will consider cases both where the state space is discrete, i.e., a 
finite or countably infinite set, and cases where the state space is continuous, 
e.g., the real numbers R or d-dimensional space R?. 

The study of deterministic (nonrandom) processes changing with time leads 
one to the study of differential equations (if time is continuous) or difference 
equations (if time is discrete). A typical (first-order) differential equation is 
of the form 


y(t) = F(t, y(t)). 


Here the change in the function y(t) depends only on t and the value y(t) 
and not on the values at times before ¢. A large class of stochastic processes 
also have the property that the change at time ¢ is determined by the value of 
the process at time ¢ and not by the values at times before t. Such processes 
are called Markov processes. The study of such processes is closely related 
to linear algebra, differential equations, and difference equations. We assume 
that the reader is familiar with linear algebra. In the next section we review 
some facts about linear differential equations that will be used and in the 
following section we discuss difference equations. 


0.2 Linear Differential Equations 


Here we briefly review some facts about homogeneous linear differential 
equations with constant coefficients. Readers who want more detail should 
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consult any introductory text in differential equations. Consider the homoge- 
neous differential equation 


y(t) + @n_ry) (t) +--+ + ary’(t) + aoy(t) = 0, (0.1) 
where ao,... ,@,_1 are constants. For any initial conditions 
10) Soy OS cee” 0) Sa 


there is a unique solution to (0.1) satisfying these conditions. To obtain such 
a particular solution, we first find the general solution. Suppose yj(t),... , 
y(t) are linearly independent solutions to (0.1). Then every solution can be 
written in the form 


y(t) = (141 (t) aa Cent) 


for constants c),... ,C,. For a given set of initial conditions we can determine 
the appropriate constants. 

The solutions y;,...,Yn are found by looking for solutions of the form 
y(t) = e**. Plugging in, we see that such a function y(t) satisfies the equation 
if and only if 


NO ggg A ergy ag 0: 


If this polynomial has n distinct roots A;,... , A, we get n linearly independent 
solutions e*1',... ,e4”*. The case of repeated roots is a little trickier, but with 
a little calculation one can show that if A is a root of multiplicity 7, then 
et tert... ,t2-le*# are all solutions. Hence for each root of multiplicity 7, 
we get 7 linearly independent solutions, and combining them all we get n 
linearly independent solutions as required. 

Now consider the first-order linear system of equation 


y(t) = airyi(t) + ai2yo(t) +--+ + Ginyn(t) 
yo(t) = aoiyi(t) + a2oyo(t) +--+ GonYn(t) 


y},(t) = Ani V1 (t) te An2Y2(t) a AnnYn(t). 


This can be written as a single vector valued equation: 


y(t) = Ag(t). 


Here y(t) = [yi(t),.-. , yn(t)] (more precisely, the transpose of this vector) and 
A is the matrix of coefficients (a;;). For any initial vector 0 = (v1,...,Un), 
there is a unique solution to this equation satisfying y(0) = v. This solution 
can most easily be written in terms of the exponential of the matrix, 


g(t) = ety, 
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This exponential can be defined in terms of a power series: 


=“. (tA)s 
cA _y > (tA)? 
j=0 


qh 


For computational purposes one generally tries to diagonalize the matrix A. 
Suppose that A = Q~'DQ for some diagonal matrix 


d,0 --- 0 
0 do--- 0 
Det. 43. « 
0 O d, 
Then 
etdi Q 0 
ctA = Q-'e'PQ a O- 0 i ad Q. 
0 0. .-- etdn 


It is not true that every matrix can be diagonalized as above. However, every 
matrix A can be written as Q~!JQ where J is in Jordan canonical form. 
Taking exponentials of matrices in Jordan form is only slightly more difficult 
than taking exponentials of diagonal matrices. See a text on linear algebra 
for more details. 


0.3. Linear Difference Equations 


The theory of linear difference equations is very similar to that of linear 
differential equations. However, since the theory is generally not studied in 
introductory differential equations courses and since difference equations arise 
naturally in discrete-time Markov chains, we will discuss their solution in more 
detail. First consider the equation 


f(n) =af(n—1)+bf(n4+1), K<n<QN. (0.2) 


Here f(n) is a function defined for integers K < n < N (N can be chosen 
to be infinity) and a,b are nonzero real numbers. If f satisfies (0.2) and the 
values f(K) and f(K + 1) are known, then f(n) can be determined for all 
Kk <n<WN recursively by the formula 


f(m+1) = UF(n) - afm —I)}, (0.3) 
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Conversely, if uo,u; are any real numbers we can find a solution to (0.2) 
satisfying f(K) = uo, f(K +1) = u, by defining f(n) recursively as in (0.3). 
Also, we note that the set of functions satisfying (0.2) is a vector space, i.e., if 
fi, fo satisfy (0.2) then so does c; f; + co fe, where c;, cg are any real numbers. 
This vector space has dimension 2; in fact, a basis for the vector space is given 
by {f1, fo}, where f, is the solution satisfying f;(AK) = 1, f;(K +1) =0 and 
fz is the solution satisfying fo( AK’) = 0, fo(K +1) = 1. If g; and go are any two 
linearly independent solutions, then it is a standard fact from linear algebra 
that every solution is of the form 


C191 + €2g92 


for constants C1, Co. 

We now make some good guesses to find a pair of linearly independent 
solutions. We will try functions of the form f(n) = a” for some a 4 0. This 
is a solution for a particular a if and only if 


a” =aa"1+ba"t!, Ken<QN, 
i.e., 1f and only if 
a=a+ba’. 
We can solve this with the quadratic formula, giving 


l1+vJ1-—4ab 
a) 


Case I: 1 — 4ab # 0. In this case there are two distinct roots, a;,a2, and 
hence the general solution is 


f(n) = cat + cay. (0.4) 
Case IT: 1 — 4ab = 0. In this case we get only one solution of this type, 
gi(n) = a” = (1/2b)”. However, if we let go(n) = n(1/2b)” we see that 
ago(n — 1) + bgo(n + 1) = a(n — 1)(1/2b)"~* + b(n + 1)(1/2b)"*? 
= (1/26)" {a(n — 1)2b + b(n + 1)/(26)] 
== ( 1/20)" = go(7). 


Therefore gz is also a solution. It is easy to check that gi, g2 are linearly 
independent, so every solution is of the form 


f(n) = c1(1/2b)” + con(1/2b)”. 
Example. Suppose we want to find a function f satisfying 


fin) = = f(n-1) + 5 Fn 4D), 0<n<ow, 


Preliminaries 5) 
with f(0) = 4, f(1) = 3. Plugging in we get, 


3475 
a= 4 . 


The general solution is 


f(n) =e (244) ve (| 4) | 


If we plug in the initial conditions, we get 


A= 7 (0) =cC, +c, 


34/5 3-7/5 


+ C2 


3= fdj=c 4 m 


Solving gives c; = 2,c2 = 2, and hence 
Sy5\ x (Hab) 
f(n) =2 ( 4) +2 (24) 


We have seen that the values of f(K) and f(K +1) uniquely determine 
the solution to (0.2). Sometimes, one is given the boundary values f(A’) and 
f(N). These boundary value problems can be solved in the same way—write 
down the general solution and solve for the constants. For example, suppose 
we want the function f which satisfies 


f(n) =2f(n—1)—f(n4+1), 0<n< 10, 
with f(0) = 0, f(10) = 1. We write down the general solution 
f(n) = c,1" + c2(—2)”. 
Plugging in the initial conditions gives 


f(0)=0=a,+¢ 


f(10) = 1 = c) + co(-2)", 


and c) = —cg = 1/(1— 2?°). 
In the study of random walks, the difference equations 


f(in)=(—p)f(n—-1)+pf(nt+1), pe (0,1) 
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arise. If p £ 1/2, we obtain two roots a; = 1,a2 = (1 — p)/p, and hence the 
general solution is 


1 oe 7m 
fn)=a te (=) | (0.5) 
Pp 
If p= 1/2, a= 1 is a repeated root so we get the general solution 


f(n) =e, + cen. (0.6) 


What we have analyzed are second-order linear difference equations. The 
general kth-order homogeneous linear difference equation is of the form 


f(n+k) =aof(n) +a: f(n+1)+---+ap_if(n +k — 1). (0.7) 


Suppose we wish to find a function satisfying (0.7) for n > 0. It suffices to 
give the values f(0),...,f(k — 1), for then f(n),n > k can be determined 
recursively. Again we look for solutions of the form f(n) = a”. Such an f is 
a solution if and only if 


k—-1 
a® =ag9 taja+---+a,%_1a0 , 


As before, if there are k distinct roots of this equation, we get k linearly 
independent solutions. If a certain a is a root with multiplicity 7, one can 
check in fact that 


n 


a”, na”, n?a",---,ni~a 


are all linearly independent solutions. In complete parallel with the case of 
linear differential equations, we get k linearly independent solutions to (0.7) 
and we can find all solutions by taking linearly combinations of these solutions. 


0.4 Exercises 


0.1 Find all functions x(t), y(t) satisfying 
x(t) = y(t) — x(t), 
y'(t) = 3x(t) — 3y(Z). 
Find the particular pair of functions satisfying x(0) = y(0) = 1/2. 
0.2 Find the function f(n),n = 0,1,...,10 that satisfies 


fn) = h(n 1) + Tf@4D), = 1,2,...,9, 
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0.3 The Fibonacci numbers F;, are defined by F, = 1, Fy = 1 and for n > 2, 
F, = Fyn_-1+ Fn—2. Find a formula for F,, by solving the difference equation. 


0.4 Find the function f(n), n =0,1,2,... that satisfies 


fn) = 5f(n-1) + gmt) + sfln+2), n>, 


lim. Fi) 


T— CO 


0.5 Find all functions f from the integers to the real numbers satisfying 


fin) = 5 fn +1) + sf(n-1)-1. (0.8) 


[Hint: First show that f(n) = n? satisfies (0.8). Then suppose f,(n) and fo(n) 
both satisfy (0.8) and find the equation that g(n) = fo(n) — fi(n) satisfies. ] 


0.6 (a) Find all functions f from the real numbers to the real numbers such 
that for all z, 


faye? @) ei @)=0. 


(b) Find all functions f from the integers to the real numbers such that for 
all n, 


f(n +2) =—f(n)- f(n+ I). 


Chapter 1 


Finite Markov Chains 


1.1 Definitions and Examples 


Consider a discrete-time stochastic process, X,,n = 0,1,2,..., where Xn 
takes values in the finite set S = {1,...,N} or {0,...,N—1}. We call the 
possible values for X,, the states of the system. To describe the probabilities 
for such a process we need to give the values of 


P{Xo = tp, X1 = th,... ,Xn = in}, 


for every n and every finite sequence of states (i9,... ,%n). Equivalently, we 
could give the initial probability distribution 


ODS Pi XG Sth OS dee dV 
and the “transition probabilities,” 
Gilt ones a at) = PX Sta | AG Stott os = et (1.1) 
for then 


P{Xo =io,..., Xn =in} = 


@lio)ai (2 | 20) ga(to | 20521) = dnltn.| t0;20ey%n24). (1:2) 


In this chapter we consider a special class of such processes, those that 
satisfy the Markov property. ‘The Markov property states that to make pre- 
dictions of the behavior of a system in the future, it suffices to consider only 
the present state of the system and not the past history. That is to say, 
the state of the system is important but not how it arrived at that state. 
Mathematically, we can write this as 


PO Ss pS Sa a Se 


We will also make the assumption that the transition probabilities do not de- 
pend on time. This is called time homogeneity. A time-homogeneous Markov 
chain is a process such that 


P{X, = Op | Xo = 10; cee oe | = in—1} = DUinaista), 
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for some function p: S x S — [0,1]. Unless explicitly stated otherwise in this 
book, when we say Markov chain we will mean time-homogeneous Markov 
chain. To give the probabilities for a Markov chain, we need to give an initial 
probability distribution ¢(i1) = P{Xo = 7}, and the transition probabilities 
p(t,7), for then, by (1.2), 


P{Xo = ip,...,Xn = in} = (40) p(to, 41) p(t1, 22) -°* D(tn—-1,in)- (1.3) 


The transition matric P for the Markov chain is the N x N matrix whose 
(1,7) entry P,,; is p(i,7). The matrix P is a stochastic matria, i.e., 


0<P, <1, 1<ij<N, (1.4) 
N 

» Peel; DoteN. (1.5) 
7 


Any matrix satisfying (1.4) and (1.5) can be the transition matrix for a Markov 
chain. 


Example 1. Two-state Markov chain. Let us give a simple model for 
the state of a phone where X, = 0 means that the phone is free at time n 
and X, = 1 means that the phone is busy. We assume that during each time 
interval there is a probability p that a call comes in (for ease we will assume 
that no more than one call comes in during any particular time interval). If 
the phone is busy during that period, the incoming call does not get through. 
We also assume that if the phone is busy during a time interval, there is a 
probability q that it will be free during the next interval. Our model gives a 
Markov chain with state space S = {0,1} and matrix 


0 1 
mice kee 
1| q Il-@q q 1l-q 
This matrix give the general form for a transition matrix of a two-state Markov 
chain. In order to specify the matrix one only needs to give the values of p 
and q. We have written the matrix in two different ways. The first way labels 


the states and the latter way does not. We will use both notations in this 
chapter. 


Example 2. Simple Queueing Model. We modify the previous example 
by assuming that the phone system can put one caller on hold. Hence at any 
time the number of callers in the system is in the set S = {0,1,2}. Again, 
any call will be completed during a time interval with probability q and a new 
caller will come in with probability p, unless the system is already full. To 
model this we set 


p(0, 0) = 1 — Pp; p(0, 1) = Pp, p(0, 2) _ Q, 
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since a caller comes in with probability p (again we are assuming only one 
caller arrives during any time period). Also, 


p(2, 0) = 0, p(2, 1) = q, p(2, 2) =1- q, 


since no new callers may arrive if there are two callers in the system, and 
both calls may not end simultaneously. If there is exactly one caller in the 
system, it is a little more complicated. The state of the system goes from 1 
to 0 if the current call is completed and no new callers enter the system, i.e., 
p(1,0) = q(1 — p). Similarly, the state goes from 1 to 2 if the current call is 
not completed but a new call arrives, i.e., p(1,2) = p(1 — q). Since the rows 
must add to 1, p(1,1) = 1 —q(1 — p) — p(1 — q) and hence 


0 1 2 
o| l-—p D 0 
P= 4) ql=piaqil=p) pil =¢) p=) 
2 0 q LG 


Transition probabilities are often represented by directed graphs, where the 
vertices of the graphs are the states and the arrows represent the transitions. 
The above matrix can be represented graphically as follows: 


Example 3. Random Walk with Reflecting Boundary. Consider a 


“random walker” moving along the sites {0,1,... , N}. 
¢—__®—___@ ¢——_® 
0 1 2 N~-1 WN 


At each time step the walker moves one step, to the right with probability p 
and to the left with probability 1 — p. If the walker is at one of the boundary 
points {0, N}, the walker moves with probability 1 toward the inside of the 
interval. The transition matrix P for this Markov chain is given by 


pii,z+1l)=p, pi,z-1l)=1-p, 0<i<N, 
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p(0,1)=1, p(N,N—1)=1, 


with p(i,7) = 0 for other values of 7,7. If p = 1/2, we call this symmetric 
or unbiased random walk with reflecting boundaries. If p 4 1/2 it is called 
biased random walk. Sometimes it is more convenient to consider partially 
reflecting boundaries where the walker at the boundary moves the same as 
on the inside except that if the walker tries to leave the states {0,... ,.N} he 
runs into a wall and goes nowhere. This corresponds to boundary conditions 


p(0,0) =1—>p, p(0,1)=p, p(N,N-1)=1-p, D(N,N) =p. 


Example 4. Random Walk with Absorbing Boundaries. This chain 
is like the previous example except that when the walker reaches 0 or N, the 
walker stays there forever. The transition matrix is given by 


pii,t+1l)=p, plijt—1l)=1-—p, 0<i<N, 


p(0,0)=1, p(N,N) =1. 


(We adopt the convention from here on that if p(i,7) is not specified for a 
particular 2,7 then it is assumed to be 0.) 


Example 5. Simple Random Walk on a Graph. A (finite, simple, 
undirected) graph is a finite collection of vertices V and a collection of edges 
F where each edge connects two different vertices and any two vertices are 
connected by at most one edge. We write v; ~ v2 if vertices v; and v2 are 
adjacent, i.e., an edge connects the two vertices. 


Consider the Markov chain whose states are the vertices of the graph. At 
each time interval, the chain chooses a new state randomly from among the 
states adjacent to the current state. The transition matrix for this chain is 
given by 


p(vi, vj) =1/d(vj), vi ~ v5, 


where d(v;) is the number of vertices adjacent to vu; [if d(v;) = 0, we let 
p(v;,v;) = 1]. This chain is called simple random walk on the graph. Sym- 
metric random walk (p = 1/2) with reflecting boundaries as in Example 3 is 
a particular example of a simple random walk on a graph. 
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Given a transition matrix P and an initial probability distribution ¢, how 
can we determine the probability that the Markov chain will be in a certain 
state i at a given time n? Define the n-step probabilities p,(i, 7) by 


Pali,j) = P{Xn = 9 | Xo = 7} = PL Xnak = 5 | Xe = 7} 


(the latter equality holds because of time homogeneity). Then 


P{Xn =j}= > Oi) P{Xn = j | Xo = th. (1.6) 


1ES 


We will now show that the n-step transition probability p,(i,7) is in fact the 
(i,7) entry in the matrix P”. To see this, we first note that this is trivially 
true for n = 1. Assume it is true for a given n. Then, 


PGi =7 oat =) Pink XH Pi Xndi S7 | Xn 
kes 


=) 0 pali, k)p(k, J). 


keS 


But if py (2,k) is the (4,k) entry of P”, the last sum is exactly the (i, 7) entry 
of P?>P = Pp”! 
An initial probability distribution can be given by a vector 


go = (¢0(1),--- , G0(N)). 


[We will denote the vector (v(1),... ,v(V)) by 0. We will use the same no- 
tation whether v is to be considered a row vector or a column vector. For 
example, we can write either UP, or Pv although v is a row vector in the first 
case and a column vector in the second.] If ¢9 is given, the distribution at 
time n, dn(i) = P{X, = 1} is given by 


bn ame doP”. 


Example 6. Consider Example 1 and assume the phone is free at time 0. 
Assume p = 1/4 and q = 1/6. Let n = 6. Then 


= [esie) ~ [set eie] 


If the phone is free at time 0, ¢9 = (1,0). If we want to know the probability 
that the phone is busy at time 6 given that it was free at time 0, we compute 


(¢oP°)(1) = .576. 
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1.2 Large-Time Behavior and Invariant Probability 


Understanding the large-time behavior of a Markov chain boils down to un- 
derstanding the behavior of P” for large n values. Let us start by considering 
a particular example, 


p= [mee] 


Taking powers of this matrix is easy (with a computer) and one can quickly 
see that 


for large n, i.e., a limit matrix 


II = lim P” 
TM— CO 

exists and the rows of II are identical. If t is any probability vector [we say 
a vector 0 = (v(1),... ,v(NV)) is a probability vector if the components are 
nonnegative and sum to 1], then 

lim uP” =7, 

TM— CO 
where 7 = (2/5,3/5) is one of the rows of II. For another example, consider 
Example 2 of Section 1.1 with p = 1/4,q = 1/6, 


3/41/4 0 
P = | 1/8 2/35/24]. (1.7) 
0 1/6 5/6 


We see the same phenomenon. In this case for large n, 


182 .364 .455 r 
P” = | .182 364.455] = | a], 
182 .364 .455 x 


where # = (2/11,4/11,5/11) and hence for every probability vector 3, 


lim uP” = f. 
TL— CO 
At any large time, the probability that the phone has no callers is about 
m(0) = 2/11, regardless of what the state of the system was at time 0. 
Suppose 7 is a limiting probability vector, i.e., for some initial probability 
vector v, 


7 = lim vP”. 
n— CO 
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Then 


¢= lim oP"t! =(lim oP")P = @P. 


TM— CO TM— CO 


We call a probability vector 7 an invariant probability distribution for P if 
n= TP: (1.8) 


Such a 7 is also called a stationary, equilibrium, or steady-state probability 
distribution. Note that an invariant probability vector is a left eigenvector of 
P with eigenvalue 1. 

There are three natural questions to ask about invariant probability distri- 
butions for stochastic matrices: 

1) Does every stochastic matrix P have an invariant probability distribution 
7? 

2) Is the invariant probability distribution unique? 

3) When can we conclude that 


T 

T 
lina PS oa 
n— CO . 

T 


and hence that for all initial probability distributions v, 


lim wy PS 9? 
TL— CO 


Let us start by considering the two-state Markov chain with 
are 
q 1-q}’ 


where 0 < p,q < 1. This matrix has eigenvalues 1 and 1 — p—q. We can 
diagonalize P, 


D=Q"'PQ, 
where 


a [ig] or [4gime23) 


1 0 
p= (3, °_|]. 


The columns of Q are right eigenvectors of P and the rows of Q~! are left 
eigenvectors. The eigenvectors are unique up to a multiplicative constant. 


16 Introduction to Stochastic Processes 


We have chosen the constant in the left eigenvector for eigenvalue 1 so that 
it is a probability vector. 7 = (q/(p + q),p/(p+q)) is the unique invariant 
probability distribution for P. Once P is diagonalized it is easy to raise P to 
powers, 


Pp” _ (QDQ™')” 
at QD"Q”! 

1 0 _] 

=Q/5a-p-gr|9 

a as a 


| aes ete 2 ed 
lq—q1-p—4)"}/(p ) 


+ q 
+4) [p+a(1—p—4)"]/(p +4) 
Since |1 — p—q| < 1, we see that 


ceed hele kad eT 


q/(p+q@) p/(p+q)| [a 


N— CO 


The key to the computation of the limit is the fact that the second eigenvalue 
1—p—gq has absolute value less than 1 and so the dominant contribution to P” 
comes from the eigenvector with eigenvalue 1, i.e., the invariant probability 
distribution. 

Suppose P is any stochastic matrix. It is easy to check that the vector 
1 = (1,1,--- ,1) is a right eigenvector with eigenvalue 1. Hence at least one 
left eigenvector for eigenvalue 1 exists. Suppose we can show that: 


The left eigenvector can be chosen to have all nonnegative entries, (1.9) 


The eigenvalue 1 is simple and all other eigenvalues 


have absolute value less than 1. (1.10) 


Then we can show that essentially the same thing happens as in the two-state 
case. It is not always true that we can diagonalize P; however, we can do 
well enough using a Jordon decomposition (consult a text in linear algebra 
for details): there exists a matrix Q such that 


D=Q ‘PQ, 


where the first row of Q~! is the unique invariant probability vector 7; the 
first column of Q contains all 1s. The matrix D is not necessarily diagonal 
but it does have the form 
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where M” — 0. Then in the same way as the two-state example, 


10---0 : 
0 TT 
lm P” = lim QD"Q7'=Q|. , Qu=|: 
0 TT 


This leads us to ask which matrices satisfy (1.9) and (1.10). The Perron— 
Frobenius Theorem from linear algebra gives one large class of matrices for 
which this is true. Suppose that P is a stochastic matrix such that all of 
the entries are strictly positive. ‘Then the Perron—Frobenius Theorem implies 
that: 1 is a simple eigenvalue for P; the left eigenvector of 1 can be chosen 
to have all positive entries (and hence can be made into a probability vector 
by multiplying by an appropriate constant); and all the other eigenvalues 
have absolute value strictly less than 1. We sketch a proof of this theorem in 
Exercise 1.20. 

While this includes a large number of matrices, it does not cover all stochas- 
tic matrices with the appropriate limit behavior. For example, consider the 
matrix P in (1.7). Although P does not have all positive entries, note that 


594 .354 .052 
P? = | .177 .510 .312 
021 .250 .729 


) 


and hence P? satisfies the conditions of the theorem. Therefore, 1 is a simple 
eigenvalue for P* with invariant probability 7 and the other eigenvalues of P? 
have absolute value strictly less than 1. Since the eigenvalues for P* are the 
squares of the eigenvalues of P, and eigenvectors of P are eigenvectors of P?, 
we see that P also satisfies (1.9) and (1.10). We then get a general rule. 


Fact. Jf P is a stochastic matrix such that for some n, P” has all entries 
strictly positive, then P satisfies (1.9) and (1.10). 


In the next section we classify all stochastic matrices P that have the prop- 
erty that P” has all positive entries for some n. 


1.3. Classification of States 


In this section we investigate under what conditions on a stochastic matrix 
P we can conclude that P” has all positive entries for some sufficiently large 
n. We start by considering some examples where this is not true. 


18 


Example 1. Simple random walk with reflecting boundary on {0, 


this case, 


oO 1 2 3 4 
0 10 6 4 
1/2 0 1/20 0 
0 1/2 0° 1/26 
/ 
0 


mre WwW NYO KF OO 


0 0 0 1 
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... 4}. In 


0 0 1/2 0 1/2 


If one takes powers of this matrix, one quickly sees that P” looks different 
depending on whether n is even or odd. For large n, if n is even, 


p” 


whereas if n is odd, 


p” 


25 0 .50 0. 


Qi 


22 


0 .50 0 
20 0 .50 
0 .50 0 
20 0 .50 


0 .50 O 
29 0 .50 
0 .50 0 
20 0 .50 
0 .50 0 


00 0 


0 25], 


00 0 


0 .25 


00 0 


0 .25 


00 0 


OQ .25 


00 0 


It is easy to see why there should be many zeroes in P”. At each step, the 
random walker moves from an “even” step to an “odd” step or vice versa. 
If the walker starts on an even site, then after an even number of steps the 
walker will be on an even site, i.e., py(t,7) = 0 if 7 is even, 7 is odd, n is even. 
Similarly, after an odd number of steps, a walker who started on an even point 
will be at an odd point. In this example we say that P has period 2. 


Example 2. Simple random walk with absorbing boundary on {0,... , 4}. 


Here, 


me WO NY KY OO 


i) 
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If n is large, we see that 


1 000 0 
79000 .25 
P” = | .5000 0.50 
.29 000.75 
0 000 1 


In this case the random walker eventually gets to 0 or 4 and then stays at 
that state forever. Look at the second row and observe that p,(1,0) — 3/4 
and p,(1,4) — 1/4. This implies that the probability that a random walker 
starting at 1 will eventually stick at 0 is 3/4, whereas with probability 1/4 
she eventually sticks at 4. We will call states such as 1, 2,3 transient states of 
the Markov chain. 


Example 3. Suppose S = {1,2,3,4,5} and 


I 2 3 4 5 
1/21/20 0 0 
1/65/60 0 0 
0 0 3/41/4 0 


0 0 1/8 2/35/24 
0 0 0 1/6 5/6 


a2 
|| 
no FF WwW NH 


For large n, 


20.79 0 0 0 
20.709 0 O O 
P” =| 0 O .182 .364 .455 
O O .182 .364 .455 
O O .182 .364 .455 


In this case the chain splits into two smaller, noninteracting chains: a chain 
with state space {1,2} and a chain with state space {3, 4,5}. Each “subchain” 
converges to an equilibrium distribution, but one cannot change from a state 
in {1,2} to a state in {3,4,5}. We call such a system a reducible Markov 
chain. 


The main goal of this section is to show that the above examples illustrate 
all the ways that a Markov chain can fail to satisfy (1.9) and (1.10). 


1.3.1 Reducibility 


We say two states 2 and 7 of a Markov chain communicate with each other, 
written i — j, if there exist m,n > 0 such that pm(i,j) > 0 and pn(j,i) > 0. 
In other words, two states communicate if and only if each state has a positive 
probability of eventually being reached by a chain starting in the other state. 
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The relation — is an equivalence relation on the state space, i.e., it is: reflexive, 
i <> 7 [since po(t,i) = 1 > 0]; symmetric, i — j implies that 7 < 7 (this is 
immediate from the definition); and transitive, i j and j7 ~ k imply i k. 
To see that transitivity holds, note that if pm, (i,j) > 0 and pm,(j,k) > 0 
then 


Pmi+mo(t,k) = P{Xmitm, = k | Xo = t} 
2 P{ Xm, +me =k, Xm, =) | Xo = i} 
= P{Xm, =Jj | Xo =t}P{Xmitm, =k | Xm, = 5} 
= Pm (t,J) Pm2 (9, k) > 0, 


and similarly py, (j,7) > 0, pn.(k,j) > 0 imply py, +n, (k, i) > 0. This equiva- 
lence relation partitions the state space into disjoint sets called communication 
classes. For example, in Example 3 of this section there are two communica- 
tion classes {1,2} and {3, 4, 5}. 


If there is only one communication class, i.e., if for all 7,7 there exists an 
n= n(i,7) with p,(z,7) > 0, then the chain is called irreducible. Any matrix 
satisfying (1.9) and (1.10) is irreducible. However, one can also check that 
Example 1 of this section is also irreducible. Example 2 has three communica- 
tion classes, {0}, {1,2,3}, and {4}. In this example, if the chain starts in the 
class {1, 2,3}, then with probability 1 it eventually leaves this class and never 
returns. Classes with this property are called transient classes and the states 
are called transient states. Other classes are called recurrent classes with re- 
current states. A Markov chain starting in a recurrent class never leaves that 
class. 


Suppose P is the matrix for a reducible Markov chain with recurrent com- 
munication classes R,,...,R, and transient classes T7],...,7. It is easy to 
see that there must be at least one recurrent class. For each recurrent class 
R, the submatrix of P obtained from considering only the rows and columns 
for states in R is a stochastic matrix. Hence we can write P in the following 
form (after, perhaps, reordering the states): 
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where P, is the matrix associated with R,. Then, 


for some matrix S,. To analyze the large time behavior of the Markov chain 
on the class R, we need only consider the matrix P,. We discuss the behavior 
of Q” in Section 1.5. 


1.3.2 Periodicity 


Suppose that P is the matrix for an irreducible Markov chain (if P is 
reducible we can consider separately each of the recurrent communication 
classes). We define the period of a state i, d = d(i), to be the greatest common 
divisor of 


Jc 4n = 0s pylti.t) Se 0h. 


In Example 1 of this section, the period of each state is 2; in fact, in this case 
Pon(i,7) > 0 and pon+i(2,2) = 0 for all n,z. 

Suppose J is any nonempty subset of the nonnegative integers that is closed 
under addition, i.e., m,n € J > m+ne J. An example of such a J is the set 
J; since Pmin(t,t) > Pm(t,t)pn(t,t). Let d be the greatest common divisor 
of the elements of J. Then J C {0,d,2d,...}. Moreover, it can be shown 
(Exercise 1.21) that J must contain all but a finite number of the elements of 
{0,d,2d,...}, i.e., there is some M such that md € J for all m > M. Hence 
J; contains md for all m greater than some M = M;. If 7 is another state 
and m,n are such that Dm(t,j) > 0,Pn(j,7) > 0, then m+ne Ji,m+ne Jj. 
Hence m+n = kd for some integer k. Also, if 1 € J;, then 


Divsweit9) = alt, PI.) alg. 2) = 0, 


and so d divides |. We have just shown that if d divides every element of J; 
then it divides every element of J;. From this we see that all states have the 
same period and hence we can talk about the period of P. (We have used the 
fact that P is irreducible. If P is reducible, it is possible for states in different 
communication classes to have different periods.) 


Example 4. Consider simple random walk on a graph (see Example 5, Sec- 
tion 1.1). The chain is irreducible if and only if the graph is connected, i.e., 
if any two vertices can be connected by a path of edges in the graph. Every 
vertex in a connected graph (with at least two vertices) is adjacent to at least 
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one other point. If v ~ w then po(v,v) > pi(v,w)pi(w,v) > 0. Therefore, 
the period is either 1 or 2. It is easy to see that the period is 2 if and only if 
the graph is bipartite, i.e, if and only if the vertices can be partitioned into 
two disjoint sets V,, V2 such that all edges of the graph connect one vertex 
of V; and one vertex V2. Note that symmetric random walk with reflecting 
boundaries gives an example of simple random walk on a bipartite graph. 


1.3.3 Irreducible, aperiodic chains 


We call an irreducible matrix P aperiodic if d = 1. What we will show 
now is the following: if P is irreducible and aperiodic, then there exists an 
M > 0 such that for all n > M, P” has all entries strictly positive. ‘To see 
this, take any 7,7. Since P is irreducible there exists some m(i,7) such that 
Pm(i,j)(t,J) > 0. Moreover, since P is aperiodic, there exists some M(i) such 
that for all n > M(2),p,(2,7) > 0. Hence for all n > M(i), 


Pn+m(i,j) (49) 2 Pitt) PmGgltd) > 0. 


Let M be the maximum value of M(i) + m(i,7) over all pairs (7,7) (the 
maximum exists since the state space is finite). Then p,(i,7) > 0 for all 
n > M and all 2,7. Using the rule at the end of Section 1.2 we can now 
summarize with the following theorem. 


Theorem. /f P is the transition matrix for an irreducible, aperiodic Markov 
chain, then there exists a unique invariant probability vector 7 satisfying 


Te =: 
If @ is any initial probability vector, 


lim @P” =7. 


TL— CO 


Moreover, m(t) > 0 for each i. 


1.3.4  Reducible or periodic chains 


We finish this section by discussing how P” behaves when P is not irre- 
ducible and aperiodic. First, assume P is reducible with recurrent classes 
R,,...,AR, and transient classes 7;,...,7. Each recurrent class acts as a 
small Markov chain; hence, there exists r different invariant probability vec- 
tors 7!,...,7” with 7* concentrated on Ry (r*(i) = 0 if i ¢ Ry). In other 
words, the eigenvalue 1 has multiplicity r with one eigenvector for each recur- 
rent class. Assume, for ease, that the submatrix P, for each recurrent class 
is aperiodic. Then if 2 € Rx, 

lim pp(i,j) =m7"(j), 7 © Re, 


N— CO 
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If 7 is any transient state, then the chain starting at 2 eventually ends up in a 
recurrent state. This means that for each transient state J, 


lim py(i,7) = 0. 

N— CO 
Let ag(i),k = 1,...,r be the probability that the chain starting in state 7 
eventually ends up in recurrent class Ry [in Section 1.5 we will discuss how 


to calculate a,(z)|. Once the chain reaches a state in Ry it will settle down 
to the equilibrium distribution on R;,. From this we see that if 7 € Rp, 


dim Pali.) = an(t) 7*(9). 
If é is an initial probability vector, 
lim oP” 
nN— CO 
exists but depends on @. 

Suppose now that P is irreducible but has period d > 1. In this case the 
state space splits nicely into d sets, A,,...Ag, such that the chain always 
moves from A; to Aji; (or Ag to A,). To illustrate the large-time behavior 
of P”, we will consider Example 1 of this section which has period 2. Let 


0 1 0 0 0 
1/2 0 1/2 0 0 
P=] 0 1/2 0 1/2 0 
0 01/2 0 1/2 
0 00 1 0 


The eigenvalues for P are 1,—1,0,1//2,—1//2. The eigenvalue 1 is simple 
and there is a unique invariant probability 7 = (1/8, 1/4, 1/4, 1/4, 1/8). How- 
ever, when powers of P are taken the eigenvector for —1 becomes important 
as well as 7. We can diagonalize P, 


D=Q''PQ, 
where 


1-1/2 1/4 -1 ¥2/4 
11/2 0 -V2/2 —1/4 
Q=|1-1/2-1/4 0 0 
11/2 0 2/2 1/4 

1-1/2 1/4 1 —V2/4 


1/8 1/4 1/4 1/4 1/8 
—1/4 1/2 -1/2 1/2 -1/4 
Quiz| 1 0 -2 0 1 
—1/4-V2/4 0 2/4 1/4 
J2/2 -1 O 1 -¥2/2 
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100 0 Oo 
0-10 0 0 
p-|000 0 0 
00 01/v2 0 
000 0 —1/¥2 


We then see that for P”, the eigenvectors for the three eigenvalues with ab- 
solute value less than 1 become irrelevant and for large n 


1/8 1/4 1/4 1/4 1/8 
1/8 1/41/41/41/8 

P” = | 1/81/41/41/41/8| + 
1/8 1/41/41/4 1/8 
1/8 1/41/41/41/8 


1/8 =<1j4 1/4. 1/4 ys 
Se a 1d te 0/8 
(2) 1/8 24/444 =1/e 1/8 
~1/8 1/4 -1/4 1/4 -1/8 
1/8 -1/4 1/4 -1/4 1/8 


The asymptotic value for P” varies depending on whether n is even or odd. 
In this case the invariant probability at a state 7, (7), does not represent the 
limit of p,(j,7). However, it does represent the average amount of time that 
is spent in site 7. In fact, one can check that for large n, the average of p,(J, 7) 
and pn+i(j,z) approaches 7(7z) for each initial state 7, 


(i) = lim S[pa(in4) + Pnsalda)) 

In general, if P is irreducible with period d, P will have d eigenvalues with 
absolute value 1, the d complex numbers z with z2 = 1. Each is simple; 
in particular, the eigenvalue 1 is simple and there exists a unique invariant 
probability 7. Given any initial probability distribution ¢, for large n, ¢P” 
will cycle through d different distributions, but they will average to 7, 


lim : oP"! +.--+6P"t?] =z. 


TL— CO 


1.4 Return Times 


Let X, be an irreducible (but perhaps periodic) Markov chain with tran- 
sition matrix P. Consider the amount of time spent in state 7 up to and 
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including time n, 
nm 
YOM => Xn =) 
m=0 

Here we write J to denote the “indicator function” of an event, i.e., the random 
variable which equals 1 if the event occurs and 0 otherwise. If 7 denotes the 
invariant probability distribution for P, then it follows from the results of the 
previous sections that 


nm 


1 
lim E(Y(j,n) | Xo =%) = lim S" P{Xm = 5 | Xo =i} 


noo 1 + noo M+ 


m=0 
= T(J), 


i.e., (j) represents the fraction of time that the chain spends in state 7. In 
this section we relate 7(j) to the first return time to the state 7. 

Fix a state 7 and assume that Xp = 7. Let T be the first time after 0 that 
the Markov chain is in state 2, 


T= minin > li Ant}. 


Since the chain is irreducible, we know that 7’ < oo with probability 1. In 
fact (see Exercise 1.7) it is not too difficult to show that E(T) < oo. 

Consider the time until the kth return to the state 7. This time is given by a 
sum of independent random variables, 7, +---+7;%, each with the distribution 
of T. Here, T,, denotes the time between the (m — 1)st and mth return. For 
k large, the law of large numbers tells us that 


1 

race +---T,) = E(T), 
i.e., there are about k visits to the state i in KE(T) steps of the chain. But 
we have already seen that in n steps we expect about n7(z) visits to the state 
i. Hence setting n = kE(T) we get the relation 


(1.11) 


This says that the expected number of steps to return to 7, assuming that the 
chain starts at 2, is given by the reciprocal of the invariant probability. The 
above argument is, of course, not completely rigorous, but it does not take 
too much work to supply the details to prove that (1.11) always holds. See 
Exercise 1.15 for another derivation of (1.11). 


Example. Consider the two-state Markov chain with S = {0,1} and 


0 1 


_ojl—-p p 
| é Aan O<p,q<l. 
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Assume the chain starts in state 0 and let J be the return time to 0. In 
Section 1.2, we showed that 7 = (q/(p+4q),p/(p + q)) and hence 
Lp 

m(0) q 


In this example we can write down the distribution for T explicitly and verify 
(1.12). For n > 1, 


(1.12) 


PIP Sa} = PX Ss ges S FXG =H 0} Ho)” 


If Y is any random variable taking values in the nonnegative integers, 


E(Y) =) nP{Y¥=n}=S SS P{Y =n} 
n=l n=-1k=1 


= PY =n} = PLY > kh. (1.13) 


k=l nk k=1 


Therefore, 
E(T)=) nP{T =n} =) P{T>n} 
n=1 n=1 


= n2  Pt+4 
=14 > pag)? Ss. 
n=? 


It should be pointed out that (1.11) only gives the expected value of the 
random variable J’ and says nothing else about its distribution. In general, 
one can say very little else about the distribution of 7’ given only the invariant 
probability 7. ‘To illustrate this, consider the two-state example above with 
p =qso that E(T) = 2. If p is close to 1, then T = 2 most of the time 
and Var(T7') is small. If p is close to 0, then T = 1 most of the time, but 
occasionally T takes on a very high value. In this case, Var(T’) is large. 

In the next section, we discuss how to compute the expected number of 
steps from 7 to 7 when 2 # 7. 


1.5 Transient States 


Let P be the transition matrix for a Markov chain X,,. Recall that a state 
1 is called transient if with probability 1 the chain visits 2 only a finite number 
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of times. Suppose P has some transient states and let Q be the submatrix of 
P which includes only the rows and columns for the transient states. Hence 
(after rearranging the order of the states) we can write 


Ee if 

i ae : 

E 4 S. “a 

As an example, we consider the random walk with absorbing boundaries (Ex- 


ample 2, Section 1.3). We order the state space S$ = {0,4,1,2,3} so that we 
can write 


0 4 
o/ 1 OQ t 2 8 
4} 0 1 1} 0 1/2 0 

P= 1/1/2 0 < Qe 2172 0: 172:\., (1.14) 
2} 0 O |1/2 0 1/2 3 0 1/2 0 
3| 0 1/2} 0 1/2 0 


The matrix Q is a substochastic matriz, i.e., a matrix with nonnegative 
entries whose row sums are less than or equal to 1. Since the states represented 
by Q are transient, Q” — 0. This implies that all of the eigenvalues of Q 
have absolute values strictly less than 1. Hence, I — Q is an invertible matrix 
and there is no problem in defining the matrix 


M = (I- Q)™°. 


Let 2 be a transient state and consider Y;, the total number of visits to 2, 
CO 
Ye) 1G). 
n=O 


Since 2 is transient, Y; < oo with probability 1. Suppose Xo = 7, where 7 is 
another transient state. ‘Then, 


E(¥%;|Xo=j)=E |> NX, =} |Xo=) 
n=0 
=) P{X, =i| Xo =35} 
n=0 


n=0 
In other words, E(Y; | Xo = 7) is the (j,7) entry of the matrix 


r+P+ P* +... 
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which is the same as the (j,7) entry of the matrix I + Q + Q? 4+ --- 
However, a simple calculation shows that 


(I+Q+Q*+---)(I-Q) =I, 


or 
1+Q+Q’°+---=(1-Q)* =M. 


We have just shown that the expected number of visits to 7 starting at 7 is 
given by M,;;, the (7,7) entry of M. If we want to compute the expected 
number of steps until the chain enters a recurrent class, assuming Xo = j, we 
need only sum M,; over all transient states 7. 

In the particular example (1.14), 


| 

pl.3/244/2 
M=(I-Q)'=2/ 12 1 
3110/2 13/2 


Starting in state 1, the expected number of visits to state 3 before absorption is 
1/2, and the expected total number of steps until absorption is 3/2+1+1/2 = 
3. 

We can also use this technique to determine the expected number of steps 
that an irreducible Markov chain takes to go from one state 7 to another state 
1. We first write the transition matrix P for the chain with 7 being the first 
site: 


p= Me 4 | 


We then change 2 to an absorbing site, and hence have the new matrix 


[ae] 


Let JT; be the number of steps needed to reach state 7. In other words, 7; is 
the smallest time n such that X, = 7. For any other state k let T;,, be the 
number of visits to k before reaching 7 (if we start at state k, we include this 
as one visit to k). Then, 


E(T;|Xo=j)=E |S — Tie | Xo=5| = >) Mie. 
kHi kXi 


In other words, M1 gives a vector whose jth component is the number of 
steps starting at 7 until reaching 2. 
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Example 1. Suppose P is the matrix for random walk with reflecting bound- 
ary, 


6) i. 2 3 4 
of 0 1 0 0 ~(0 
i{1/2 0 1/2 0 0 

P= 2] 0 1/2 0 1/2 0 
3} 0 0 1/2 0 1/2 
41 0 0 0 1 0 


If we let 2 = 0, then 


1 2 3 4 bea 

if 0 1/2 0 0 (2209 

— 2/1 1/2 0 1/2 0 7 af 2 24042 
Q= 3] 0 1/2 0 1/2] Me = lo aga) 

41 0 0 1 0 412464 


Mi -= (7,.12/15, 16). 
Hence, the expected number of steps to get from 4 to 0 is 16. 


We now suppose that there are at least two different recurrent classes and 
ask the question: starting at a given transient state 7, what is the probability 
that the Markov chain eventually ends up in a particular recurrent class? In 
order to answer this question, we can assume that the recurrent classes consist 


of single points r1,... ,r~ with p(r;,7;) = 1. If we order the states so that the 
recurrent states r;,...,7% precede the transient states t,,... ,t;, then 
1/0 
| 
S|Q 


Fori =1,...,8,j7 =1,...,k, let a(t;,r;) be the probability that the chain 
starting at ¢; eventually ends up in recurrent state r;. We set a(rj,r;) = 1 
and a(r;,r;) =O if i #7. For any transient state t;, 
a(t;,r;) = P{Xn =7; eventually | Xo =t;} 
= So P{Xy =2|Xo9 =t,}P{X, =r; eventually |X; =z} 
GES 


oe S- p(ti, r)a(x, r;). 


res 


If A is the s x k matrix with entries a(t;,r;), then the above can be written 
in matrix form 


A=S+QA, 
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or 
A =(I-Q)"'S=MS. 
Example 2. Consider a random walk with absorbing boundary on {0,... , 4}. 


If we order the states {0,4,1,2,3} so that the recurrent states precede the 
transient states then 


0 4 i (9 1B 0 4 

11/2 0 1[3/211/2 1[3/41/4 
S= 2) 0) 0 |e Meo) 2-2. 7. ly. MS 23) 172.172 
3} O 1/2 3) 1/21 3/2 3| 1/4 3/4 


Hence, starting at state 1 the probability that the the walk is eventually 
absorbed at state 0 is 3/4. 


Example 3. Gambler’s Ruin. Consider the random walk with absorbing 
boundary on {0,...,N}. Let a(j) = a(j,N) be the probability that the 
walker starting at state 7 eventually ends up absorbed in state N. Clearly, 
a(0) = 0,a(N) = 1. For 0 < j < N, we can consider one step as above and 
note that 


a(j) = (1 — p)a(j — 1) + pa(z + 1). (1.15) 


This gives us N — 1 linear equations in N — 1 unknowns, a(1),--- ,a(N —1). 
To find the solution, we need to know how to solve linear difference equations. 
By (0.5) and (0.6), the general solution of (1.15) is 


a(j) =¢1 + e2 (==) , p#1/2, 


a(j) =a, +¢e2.j, p=1/2. 


The boundary conditions a(0) = 0,a(N) = 1 allow us to determine the 
constants c;,C2, SO we get 


s_ 1) 
a= = (azyN 
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=, p=1/2. (1.16) 
Note that if p < 1/2, then for any fixed 7, 


a(j) = 


ee 


This says that if a gambler with fixed resources 7 plays a fair (or unfair) game 
in which the gambler wins or loses one unit with each play, then the chance 
that a gambler will beat a house with very large resources N is very small. 
However, if p > 1/2, 


N—-oo 


lim a(j) =1- eo) > 0. 


This says that there is a positive chance that the gambler playing a game in 
the gambler’s favor will never lose all the resources and will be able to play 
forever. 

Suppose p = 1/2, and let T be the time it takes for the random walk to 
reach 0 or N, and let 


G(j) = GG, N) =E[P | Xo = J). 


Clearly, G(0) = 0,G(N) = 0 and by considering one step we can see that 
] ] 
GQ) =1+5G0—-1)+ 560+), = 1 (1.17) 


This is an example of an inhomogeneous linear difference equation. One solu- 
tion of the equation is given by Go(j) = 7”. Also, if G,,G2 are two solutions 
to (1.17), we can see that g = G; — Go satisfies the homogeneous equation 


| ae 3). 
(= 5901) t 90 )), j=l,...,n—1. 


Using this, we can see that all solutions of (1.17) are of the form 
G(j) = 7° eo ka C2 J. 


Plugging in the boundary conditions G(0) = G(NV) = 0, allows us to determine 
the constants c),Cc2, and we get 


EE Ao Sg = oN Sy) (1.18) 


1.6 Examples 


Simple Random Walk on a Graph (Example 5, Section 1.1). Assume 
the graph is connected so that the walk is irreducible. Let e denote the total 
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number of edges in the graph and d(v) the number of edges that have v as 
one of their endpoints. Since each edge has two endpoints, the sum of d(v) 
over the vertices in the graph is 2e. It is easy to check that 


m(v) = d(v)/2e, 
is the invariant probability measure for this chain. 


Simple Random Walk on a Circle. Let N > 2 be an integer. We can 
consider {0,1,...,N—1} to bea “circle” by assuming that N — 1 is adjacent 
to 0 as well as N — 2. 


Let X, be simple random walk on the circle. The transition probabilities 
are 


1 
p(k,k-1)=p(k—Lk)=5, k=1...,N-1, 


p(0, N —1) = p(N — 1,0) = =. 


The invariant probability is the uniform distribution. Assume that Xo = 0 
and let T;, denote the first time at which the number of distinct points visited 
equals k. Then 7'y is the first time that every point has been visited. By 
definition T; = 0, and clearly Tj = 1. We will compute r(k) = E[T; — T,-3| 
for k = 3,... ,N; a little thought will show that the value depends only on k 
and not on N. Note that at time 7;_; the chain is at a boundary point so that 
one of the neighbors of X7,_, has been visited and the other has not. In the 
next step we will either visit the new point or we will go to an interior point. 
If we go to the interior point, the random walk has to continue until it reaches 
a boundary point and then we start afresh. By (1.18), the expected time that 
it takes the random walk from the interior point (next to the boundary point) 
to reach a boundary point is k — 3. We therefore get the equation 


r(k) = 1+ 5 [(k-3) + r(4)], 
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or r(k) =k-—1. Therefore, 


N(N —1) 


E [Ty] = 1+ SOE (Te — Tei] = 1+ S0(k-1) = 


k=3 k=3 


We can also ask for the distribution of X7,,, the last point to be visited by the 
chain. It turns out that the distribution of this random variable is uniform on 
{1,...,N—1}. We leave the derivation of this fact to the exercises (Exercise 
1.16). 


Urn Model. Suppose there is an urn with N balls. Each ball is colored 
either red or green. In each time period, one ball is chosen at random from 
the urn and with probability 1/2 is replaced with a ball of the other color; 
otherwise, the ball is returned to the urn. Let X, denote the number of red 
balls after n picks. Then X,, is an irreducible Markov chain with state space 
{0,...,N}. The transition matrix is given by 


-— N-j a J _ 
eee = eee ee 1 a dV 
p(7,j + 1) aN p(7,j — 1) ON? P(J,7) 5 «I =91,..., 


One might guess that this chain would tend to keep the number of red balls 
and green balls about the same. In fact, the invariant probability is given by 


the binomial distribution 
IN a 
wy) = (" )2 Ne 


It is straightforward to show that this is an invariant probability, 


See 


(7P)(J) m(k)p(k, J) 


Mgt) laa Pe ey) 
anf N VN GSD INN oe [lf IN NGEA 
a Se a ge ans 
faa aN 7 Gist biaare 


a # = n(j). 


Hence the probability distribution in equilibrium for the number of red balls 
is the same as the distribution for the number of heads in N flips of a coin. 
Recall by the central limit theorem, the number of heads is N/2 with a ran- 
dom fluctuation which is of order VN. We could have guessed the invariant 
distribution by considering the problem slightly differently: suppose we al- 
ways keep the same N balls, but when a ball is chosen we paint it the other 
color with probability 1/2. Then in the long run, we would expect the colors 
of the N balls to become independent with each ball having probability 1/2 
of being red. 
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Cell Genetics. Consider the following Markov chain which models repro- 
duction of cells. Suppose each cell contains N particles each of either one of 
two types, I or Il. Let 7 be the number of particles of type I. In reproduction, 
we assume that the cell duplicates itself and then splits, randomly distribut- 
ing the particles. After duplication, the cell has 27 particles of type I and 
2(.N — j) particles of type II. It then selects N of these 2N particles for the 
next cell. By using the hypergeometric distribution we see that this gives rise 


to transition probabilities 
29\ (2(N — Jj) 
k N-k 


2N 
() 
This Markov chain has two absorbing states, 0 and N. Eventually all cells 
will have only particles of type I or of type II. 

Suppose we start with a large number of cells each with 7 particles of type 
I. After a long time the population will be full of cells all with one type of 
particle. What fraction of these will be all type I? Since the fraction of type I 
particles does not change in this procedure we would expect that the fraction 
would be j/N. In other words, if we let a(j) be the probability that the 
Markov chain starting in state 7 is eventually absorbed in state N, then we 
expect that 


Pj, k) = 


For 1 <7 < N —1 we can, in fact, verify that this choice of a(7) satisfies 


N 


a(j) = S_ pli, k) a(k), 


k=0 
and hence gives the absorption probabilities. 


Card ShufHing. Consider a deck of cards numbered 1,...,n. At each 
time we will shuffle the cards by drawing a card at random and placing it at 
the top of the deck. This can be thought of as a Markov chain on S,, the 
set of permutations of n elements. If A denotes any permutation (one-to-one 
correspondence of {1,... ,n} with itself), and v; denotes the permutation cor- 
responding to moving the 7th card to the top of the deck, then the transition 
probabilities for this chain are given by 


] 
DAWA) HS Sy Fajen a 
n 


This chain is irreducible and aperiodic. It is easy to verify that the unique 
invariant probability is the uniform measure on 5,,, the measure that assigns 
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probability 1/n! to each permutation. Therefore, if we start with any ordering 
of the cards, after enough moves of this kind the deck will be well shuffled. 

A much harder question which we will not discuss in this book is how many 
such moves are “enough” so the deck of cards is shuffled. Other questions, 
such as the expected number of moves from a given permutation to another 
given permutation, theoretically can be answered by the methods described in 
this chapter yet cannot be answered from a practical perspective. The reason 
is that the transition matrix is n! x n! which (except for small 7) is too large 
to do the necessary matrix operations. 


1.7 Exercises 


1.1 The Smiths receive the paper every morning and place it on a pile after 
reading it. Each afternoon, with probability 1/3, someone takes all the papers 
in the pile and puts them in the recycling bin. Also, if ever there are at least 
five papers in the pile, Mr. Smith (with probability 1) takes the papers to 
the bin. Consider the number of papers in the pile in the evening. Is it 
reasonable to model this by a Markov chain? If so, what are the state space 
and transition matrix? 


1.2 Consider a Markov chain with state space {0,1} and transition matrix 
0 1 
p_°|1 /3 2/3 
| By 


Assuming that the chain starts in state 0 at time n = 0, what is the probability 
that it is in state 1 at time n = 3”? 


1.3 Consider a Markov chain with state space {1, 2,3} and transition matrix 
i: 2.3 
i] 4.2.4 
P= 2} .60 4 
3} .2.0 .3 


What is the probability in the long run that the chain is in state 1? Solve this 
problem two different ways: 1) by raising the matrix to a high power; and 2) 
by directly computing the invariant probability vector as a left eigenvector. 
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1.4 Do the same for the transition matrix 
i 2-3 
1| .2.4 .4 
P= oF 4 
3] .6.3.1 


1.5 Consider the Markov chain with state space S = {0,... ,5} and transi- 
tion matrix 


0 1 2 3 4 #5 

of 5.5000 0 

i| 3.7000 0 
B80 DL O.0 0 
3| .25 .25 0 0 .25 .25 

41 0 0.70 3 0 

si 0F <2 0620: A 


What are the communication classes? Which ones are recurrent and which are 
transient? Suppose the system starts in state 0. What is the probability that 
it will be in state 0 at some large time? Answer the same question assuming 
the system starts in state 5. 


1.6 Assume that the chain in Exercise 1.3 starts in state 2. What is the 
expected number of time intervals until the chain is in state 2 again? 


1.7 Let X,, be an irreducible Markov chain on the state space {1,... , NV}. 
Show that there exist C' < oo and p < 1 such that for any states 3, 7, 


Pi Xa 724 t Xp St) = Co. 


Show that this implies that E(T) < oo, where T is the first time that the 
Markov chain reaches the state 7. (Hint: there exists a 6 > 0 such that for all 
2, the probability of reaching 7 some time in the first N steps, starting at 2, 
is greater than 6. Why?) 


1.8 Consider simple random walk on the graph below. (Recall that simple 
random walk on a graph is the Markov chain which at each time moves to an 
adjacent vertex, each adjacent vertex having the same probability. ) 

A B 
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(a) In the long run, about what fraction of time is spent in vertex A? 

(b) Suppose a walker starts in vertex A. What is the expected number of 
steps until the walker returns to A? 

(c) Suppose a walker starts in vertex C’. What is the expected number of 
visits to B before the walker reaches A? 

(d) Suppose the walker starts in vertex B. What is the probability that the 
walker reaches A before the walker reaches C’? 

(e) Again assume the walker starts in C’. What is the expected number of 
steps until the walker reaches A? 


1.9 Consider the Markov chain with state space {1,2,3,4,5} and matrix 


| 3 4 3) 
1[01/32/3 0 0 
210 0 0 1/43/4 

P= 3/0 0 0 1/21/2 
411 0 0 0 0 
5110 0 0 0 


(a) Is the chain irreducible? 
(b) What is the period of the chain? 
(c) What are p1,000(2, 1), P1,000(2, 2), P1,000(2, 4) (approximately)? 

(d) Let T be the first return time to the state 1, starting at state 1. What 
is the distribution of T and what is E(T)? What does this say, without any 
further calculation, about 7(1)? 

(e) Find the invariant probability 7. Use this to find the expected return 
time to state 2, starting in state 2. 


1.10 Suppose X,, is a Markov chain with state space {0,1,... ,6} and tran- 
sition probabilities 


3 1 

(0)= 2. pa) == 

p(0, 0) 4? p(0, 1) 4? 
1 1 1 
LO). eye .. wd) See 
p(1, 0) 9? p(1,1) 4’ p(1, 2) 4? 


1 1 1 
p(6, 0) z? P(6,5) z? P66) 5 
and for 7 = 2,3, 4,5, 
pQG,0) = pG.3 — 1) = PG. 5) = wG,5 + 1) = 7. 


(a) Is this chain irreducible? Is it aperiodic? 
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(b) Suppose the chain has been running for a long time and we start watch- 
ing the chain. What is the probability that the next three states will be 4, 5,0 
in that order? 


(c) Suppose the chain starts in state 1. What is the probability that it 
reaches state 6 before reaching state 0? 


(d) Suppose the chain starts in state 3. What is the expected number of 
steps until the chain is in state 3 again? 


(e) Suppose the chain starts in state 0. What is the expected number of 
steps until the chain is in state 6? 


1.11 Let X,,Xo,... be the successive values from independent rolls of a 
standard six-sided die. Let S, = X; +---+ X,. Let 


T; = min{n > 1: S, is divisible by 8}, 


Tz = min{n > 1:5, —1 is divisible by 8}. 


Find E (7) and E (7). (Hint: consider the remainder of S, after division by 
8 as a Markov chain.) 


1.12 Let X,Y, be independent Markov chains with state space {0, 1,2} 
and transition matrix 


0 1 2 
of 1/21/41/4 
P= 1] 1/41/41/2 
2} 0 1/21/2 


Suppose Xo = 0, Yo = 2 and let 
jee ame, Oram on 


(a) Find E(T). 

(b) What is P{ Xp = 2}? 

(c) In the long run, what percentage of the time are both chains in the same 
state? 

[Hint: consider the nine-state Markov chain Z,, = (Xn, Yn).| 


1.13 Consider the Markov chain described in Exercise 1.1. 
(a) After a long time, what would be the expected number of papers in the 
pile? 
(b) Assume the pile starts with 0 papers. What is the expected time until 
the pile will again have 0 papers? 
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1.14 Let X,, be a Markov chain on state space {1, 2,3, 4,5} with transition 
matrix 


1 2 3 4 3) 

i] 0° 1/21/26: <0 

2} 0 0 O 1/54/5 
P=3] 0 0 O 2/53/5 
41 1 0 0 0 0 
511/20 0 0 1/2 


(a) Is this chain irreducible? Is it aperiodic? 

(b) Find the stationary probability vector. 

(c) Suppose the chain starts in state 1. What is the expected number of 
steps until it is in state 1 again? 

(d) Again, suppose Xp = 1. What is the expected number of steps until 
the chain is in state 4? 

(e) Again, suppose Xo = 1. What is the probability that the chain will 
enter state 5 before it enters state 3? 


1.15 Let X,, be an irreducible Markov chain with state space S starting at 
state 2 with transition matrix P. Let 


Pf Simin SO X= 1} 


be the first time that the chain returns to state 7. For each state 7 let r(j) be 
the expected number of visits to 7 before returning to 2, 


rj) =E bs = a) | 


n=0 


Note that r(z) = 1. 
(a) Let 7 be the vector whose jth component is r(7). Show that 7P = fr. 
(b) Show that 


E(T) = > )r(). 
jes 
(c) Conclude that E(T) = m(i)~', where # denotes the invariant probability. 
1.16 Consider simple random walk on the circle {0,1,... , N —1} started at 


0 as described in Section 1.6. Show that the distribution of X7,, is uniform 
on {1,2,...,N—1}. 


1.17 The complete graph on {1,... , N} is the simple graph with these ver- 
tices such that any pair of distinct points is adjacent. Let X, denote simple 
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random walk on this graph and let T be the first time that the walk reaches 
the state 1. 

(a) Give the distribution of T assuming Xo = 1. Verify (1.11) for this chain. 

(b) What is E[T | Xo = 2]? 

(c) Find the expected number of steps needed until every point has been 
visited at least once. 


1.18 Suppose we take a standard deck of cards with 52 cards and do the 
card shuffling procedure as in Section 1.6. Suppose we do one move every 
second. What is the expected amount of time in years until the deck returns 
to the original order? 


1.19 Suppose we flip a fair coin repeatedly until we have flipped four con- 
secutive heads. What is the expected number of flips that are needed? (Hint: 
consider a Markov chain with state space {0,1,... ,4}.) 


1.20 In this exercise we outline a proof of the Perron—Frobenius Theorem 
about matrices with positive entries. Let A = (a;;) be an N x N matrix with 


ai; > 0 for all 2,7. For vectors u = (it) c0s 0” and US (ut cs: ,u™) we 
write u > Uv if u’ > v* for each 2 and u > Uv if u* > v"* for each 2. We write 
0 = (0,... ,0). 


(a) Show that i 


fv and v #0, then Av > 0. 
For any vector Uv > 


a 
0, let g(v) be the largest A such that 
Av > dv. 


(b) Show that g(v) > 0 for any nonzero v > 0 and if c > 0 then g(cv) = g(V). 
Let 


a = sup g(v), 


where the supremum is over all nonzero t > 0. By (b) we can consider the 
supremum over all v with 


oll = (wh)? +o + (WN) = 1. 


By continuity of the function g on {||v|| = 1} it can be shown that there exists 
at least one vector v0 > 0 with g(v) =a. 
(c) Show that for any v with g(v) =a, 


Av = au, 


l.e., U is an eigenvector with eigenvalue a. [Hint: we know by definition that 
Av > av. Assume that they are not equal and consider 


A{Av — ad], 
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using (a).| 

(d) Show that there is a unique o > 0 with g(d) = a and 7, v' = 1. 
[Hint: assume there were two such vectors, 0), 02, and consider g(v; — U2) and 
g(|v1 — v2|) where 


eet a) | 


(e) Show that all the components of the v in (c) are strictly positive. (Hint: 
if Av > Xv then A(Av) > AAW.] 

(f) Show that if A is any other eigenvalue of A, then |A| < a. (Hint: assume 
Aut = Au and consider A|i].) 

(g) Show that if B is any (N — 1) x (N — 1) submatrix of A, then all the 
eigenvalues of B have absolute value strictly less than a. {Hint: since B is a 
matrix with positive entries, (a)—(f) apply to B.] 

(h) Consider 


f(A) = det(A — AI). 
Show that 


f(A) = — >- det(B; — AD), 


where B; denotes the submatrix of A obtained by deleting the ith row and 
ith column. 
(i) Use (g) and (h) to conclude that 


f(a) > 9, 


and hence that a is a simple eigenvalue for A. 

(j) Explain why every stochastic matrix with strictly positive entries has a 
unique invariant probability with all positive components. (Apply the above 
results to the transpose of the stochastic matrix.) 


1.21 An elementary theorem in number theory states that if two integers 
m and n are relatively prime (i.e., greatest common divisor equal to 1), then 
there exist integers x and y (positive or negative) such that 


mxz+ny = 1. 


Using this theorem show the following: 
(a) If m and n are relatively prime then the set 


{rm+ny:2,y positive integers } 


contains all but a finite number of the positive integers. 

(b) Let J be a set of nonnegative integers whose greatest common divisor 
is d. Suppose also that J is closed under addition, m,n € J > mine J. 
Then J contains all but a finite number of integers in the set {0,d, 2d,... }. 


Chapter 2 


Countable Markov Chains 


2.1 Introduction 


In this chapter, we consider (time-homogeneous) Markov chains with a 
countably infinite state space. A set is countably infinite if it can be put into 
one-to-one correspondence with the set of nonnegative integers {0,1,2,...}. 
Examples of such sets are: Z, the set of all integers; 2Z, the set of even 
integers; and Z’, the set of lattice points in the plane, 


Z? = {(i,7) : 1,7 integers}. 


(The reader may wish to consider how Z? and {0,1,2,...} can be put into 
one-to-one correspondence.) Not all infinite sets are countably infinite; for 
example, the set of real numbers cannot be put into one-to-one correspondence 
with the positive integers. 

We will again let X, denote a Markov chain. Some of that which was de- 
scribed for finite-state Markov chains holds equally well in the infinite case; 
however, some things become a bit trickier. We again can speak of the transi- 
tion matrix, but in this case it becomes an infinite matrix. We will choose not 
to use the matrix notation here, but simply write the transition probabilities 
as 


p(x, y) = P{X, = y| Xo = FH, Taye 2: 


The transition probabilities are nonnegative and the “rows” add up to 1], i.e., 
for each x € S, 


>- p(2,y) = 1. 


yeES 


We have chosen to use x, y, z for elements of the state space S. We also define 
the n-step transition probabilities 


Pr(Z,y) = P{X, = y | Xo = ZH. 


A3 
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If0O< m,n < ow, 


Pmin(Z,y) = P{Xmin =y|Xo= x} 
= SP Xaea = Ay =o | Xo 
zES 


= »S Pale, z) Pn(Z, y). 


zES 


This equation is sometimes called the Chapman—Kolmogorov equation. It can 
be considered the definition of matrix multiplication for infinite matrices. 


Example 1. Random Walk with Partially Reflecting Boundary at 
0. Let0<p<land S = {0,1,2,...}. 


The transition probabilities are given by 
p(z,x-1)=1-p, p(z,r+1)=p, x>O0, 


and 


Example 2. Simple Random Walk on the Integer Lattice. Let Z”% be 
the d-dimensional integer lattice, i.e., 


To iN Bis gta) CRS LY: 


Note that each element x of Z% has 2d “nearest neighbors” in Z? which are 
distance 1 from x. Simple random walk on Z? is the process X,, taking values 
in Z? which at each time moves to one of the 2d nearest neighbors of its current 
position, choosing equally among all the nearest neighbors. More precisely, it 
is the Markov chain with state space S = Z% and 


5a if leak 
play) ={ card 


0, otherwise. 


Example 3. Queueing Model. Let X, be the number of customers waiting 
in line for some service. We think of the first person in line as being serviced 
while all others are waiting their turn. During each time interval there is a 
probability p that a new customer arrives. With probability qg, the service 
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for the first customer is completed and that customer leaves the queue. We 
put no limit on the number of customers waiting in line. This is a Markov 
chain with state space {0,1,2,...} and transition probabilities (see Example 
2, Section 1.1): 


p(z,cx—1)=q(1—p), plz,x)=qpt+(1—-q@)(1—p), 
p(z,x+1)=p(l-—q), x>0; 
p(0,0)=1—p, p(0,1) =p. 


As in the case of finite Markov chains, our goal will be to understand the 
behavior for large time. Some of the ideas for finite chains apply equally 
well to the infinite case. For example, the notion of communication classes 
applies equally well here. Again, we call a Markov chain irreducible if all the 
states communicate. All the examples discussed in this chapter are irreducible 
except for a couple of cases where all the states but one communicate and that 
one state x is absorbing, p(z,xz) = 1. We can also talk of the period of an 
irreducible chain; Examples 1 and 3 above are aperiodic, whereas Example 
2 has period 2. It will not always be the case that an irreducible, aperiodic 
Markov chain with infinite state space converges to an equilibrium probability 
distribution. 


2.2 Recurrence and Transience 


Suppose X,, is an irreducible Markov chain with countably infinite state 
space S and transition probabilities p(x, y). We say that X, is a recurrent 
chain if for each state x, 


P{X, = 2 for infinitely many n} = 1, 


i.e., if the chain returns infinitely often to x. If an irreducible chain visits a 
certain state x infinitely often then it must visit every state infinitely often. 
(The basic reason is that if y is another state there is a positive probability 
of reaching y from zx. If z is visited infinitely often then we get this chance 
of reaching y infinitely often. If a certain event has a positive probability of 
occurring, and we get an infinite number of trials, then the event will occur 
an infinite number of times.) If the chain is not recurrent, then every state is 
visited only a finite number of times. In this case, the chain is called transient. 
It is not always easy to determine whether a given Markov chain is recurrent 
or transient. In this section we give two criteria for determining this. 
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Fix a site x and assume that Xo = x. Consider the random variable R 
which gives the total number of visits to the site x, including the initial visit. 
We can write R as 


R= So 1X, =a 


n=0 


where again we use J to denote the indicator function, which equals 1 if the 
event occurs and 0 otherwise. If the chain is recurrent then RF is identically 
equal to infinity; if the chain is transient, then R < oo with probability 1. We 
can compute the expectation of R (assuming Xo = 2), 


Eh) E So {Xn a S- P{Xn SoS >: Pale, 2). 


We will now compute E(R) in a different way. Let 7 be the time of first 
return to 2, 


ft Sminin S03x,, =}: 


We say that T = 00 if the chain never returns to x. Suppose P{T < oo} = 1. 
Then with probability one, the chain always returns and by continuing we 
see that the probability that the chain returns infinitely often is 1 and the 
chain is recurrent. Now suppose P{T < co} = q < 1, and let us compute the 
distribution of R in terms of qg. First, R = 1 if and only if the chain never 
returns; hence, P{R = 1} = 1-—q. If m > 1, then R = m if and only if the 
chain returns m — | times and then does not return for the mth time. Hence, 
P{R =m} = q™~'(1—4q). Therefore, in the transient case, q < 1, 


E(R) = \> mP{R =m} = Soman (1-4) = p< 00. 


We have concluded the following: 


Fact. An irreducible Markov chain is transient if and only if the expected 
number of returns to a state is finite, i.e., if and only if 


1e.@) 
S— n(x, 2) < OO. 
n=0 


Example. Simple Random Walk in Z%. We first take d = 1, and consider 
the Markov chain on the integers with transition probabilities 


p(x,x+1)=p(a,x4—-1)= =. 
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We will concentrate on the state x = 0 and assume Xo = 0. Since this chain 
has period 2, p,(0,0) = 0 for n odd. We will write down an exact expression 
for pon(0, 0). Suppose the walker is to be at 0 after 2n steps. Then the walker 
must take exactly n steps to the right and n steps to the left. Any “path” 
of length 2n that takes exactly n steps to the right and n steps to the left is 
equally likely. 


FIGURE 2.1: The graph of a random walk path that is at the origin after 
16 steps. 


Each such path has probability (1/2)?”" of occurring since it combines 2n 
events each with probability 1/2. There are ee ways of choosing which n 
of the 2n steps should be to the right, and then the other n are to the left. 


Therefore, 
Q2n\ (1\*"  (2n)! (1\7" 
Pan(0, 0) = fe (5) nin! (5) ) 


It is not so easy to see what this looks like for large values of n. However, we 
can use Stirling’s formula to estimate the factorials. Stirling’s formula (see 
Exercise 2.18) states that 


ni ~ V2rnn" ee”, 


where ~ means that the ratio of the two sides approaches 1 as n goes to 
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infinity. If we plug this into the above expressions we get that 


Pon (0, 0) oe = (2.1) 


1/2 


In particular, since > n = 00, 


S - pon(0, 0) =o, 
n=0 


and simple random walk in one dimension is recurrent. 
We now take d > 1 so that the chain is on the d-dimensional integer lattice 
Z¢ and has transition probabilities 


p(z,y) =1/2d,  |x—yl|=1. 


FIGURE 2.2: The lattice Z?. 


Again we start the walk at 0 = (0,... ,0). We will try to get an asymptotic 
expression for p2,(0,0) [again p,(0,0) = 0 for n odd]. The combinatorics 
are somewhat more complicated in this case, so we will give only a sketch of 
the derivation. Suppose a walker takes 2n steps. Then by the law of large 
numbers, for large values of n, we expect that 2n/d of these steps will be 
taken in each of the d components. We will need the number of steps in each 
component to be even if we have any chance of being at 0 in n steps. For large 
n the probability of this occurring is about (1/2)¢~! (whether or not an even 
number of steps have been taken in each of the first d — 1 components are 
almost independent events; however, we know that if an even number of steps 
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have been taken in the first d — 1 components then an even number of steps 
have been taken in the last component as well since the total number of steps 
taken is even). In each component, if about 2n/d steps have been taken, then 
by (2.1) we would expect that the probability that that component equals 0 
is about (7(n/d))~!/*. Combining this, we get an asymptotic expression 


mo (3)() 


Recall that }> n~* < oo if and only if a > 1. Hence, 


— =o0,d=1,2, 
2 Pan(040) { < OO, d > 3. 


We have derived the following. 
Fact. Simple random walk in Z4 is recurrent if d= 1 or 2 and is transient if 
d > 3. 


We now consider another method for determining recurrence or transience. 
Suppose X,, is an irreducible Markov chain and consider a fixed state which 
we will denote z. For each state x, we set 


a(x) = P{X, = z for some n > 0 | Xo = z}. 


Clearly, a(z) = 1. If the chain is recurrent, then a(x) = 1 for all x. However, 
if the chain is transient there must be states x with a(x) < 1. In fact, although 
not quite as obviously, if the chain is transient there must be points “farther 
and farther” away from z with a(x) as small as we like. 
If x # z, then 
a(z) = P{X, = z for some n > 0| Xo = x} 
= P{X, = z for some n > 1| Xo = x} 
=) P{X, =y|Xo =a}P{X, =z for some n >1| X, = y} 
yes 
=) p(z,y) aly). 
yes 


Summarizing, a(x) satisfies the following: 


0< a(x) <1, (2.2) 


a(Zy=1, intia(e) sme S)=—0, (2.3) 
and 


a(t) = p(a,y)aly), 2 #z. (2.4) 


yes 
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It turns out that if X,, is transient, then there is a unique solution to (2.2) — 
(2.4) that must correspond to the appropriate probability. Moreover, it can 
be shown (we prove this in Chapter 5, Section 5.5, Example 5) that if X,, is 
recurrent there is no solution to (2.2) — (2.4). This then gives another method 
to determine recurrence or transience: 


Fact. An irreducible Markov chain is transient if and only if for any z we 
can find a function a(x) satisfying (2.2) - (2.4). 


Example. Consider Example 1 in the previous section, random walk with 
partially reflecting boundary. Let z = 0 and let us try to find a solution to 
(2.2) — (2.4). The third equation states that 


a(x) = (1-—p)a(z—-1)+pa(r4+1), x>0. 


From (0.5) and (0.6) we see that the only solutions to the above equation are 
of the form 


a(x) = c, + C2 (=) , p#1/2, 


Oe) Se) can; p= 1/2. 
The first condition in (2.3) gives a(0) = 1; plugging this in gives 


=e (=) - p#1/2 (2.5) 


Ole) =Lb ese. p= 1/2. (2.6) 


If we choose co = 0, we get a(x) = 1 for all x which clearly does not satisfy 
(2.3). If p = 1/2 and cg # O, then the solution is not bounded and hence 
cannot satisfy (2.2). Similarly, if p < 1/2, the solution to (2.5) will be un- 
bounded for co # 0. In this case, we can conclude that the chain is recurrent 
for p < 1/2. For p > 1/2, we can find a solution. The second condition in 
(2.3) essentially boils down to a(x) — 0 as x — ov, and we get 


wo (52) 


Therefore, for p > 1/2, the chain is transient. 


2.3. Positive Recurrence and Null Recurrence 


Suppose X,, is an irreducible, aperiodic Markov chain on the infinite state 
space S. In this section we investigate when a limiting probability distribution 
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exists. A limiting probability m(2),x2 € S is a probability distribution on S$ 
such that for each z,y € S, 


lim: pa (yx) = w(x). 
If X,, is transient, then 
lim Pn(y,r) = 0, (2.7) 


for all x, y, so no limiting probability distribution exists. It is possible, how- 
ever, for (2.7) to hold for a recurrent chain. Consider, for example, simple 
random walk on Z described in the last section (this is actually a periodic 
chain, but a small modification can be made to give an aperiodic example). 
It is recurrent but p2,(0,0) — 0 as n — oo. We call a chain null recurrent if 
it is recurrent but 


lim pr(x,y) = 0. 


Otherwise, a recurrent chain is called positive recurrent. 

Positive recurrent chains behave very similarly to finite Markov chains. If 
X,y, is an irreducible, aperiodic, positive recurrent Markov chain, then for every 
x,y, the limit 


lim pr(y,x) = (x) > 0, 


exists and is independent of the initial state y. The a(x) give an invariant 
probability distribution on S, i.e., 


Y= r(y)ply, 2) = x(2). (2.8) 


yes 


Moreover, if we consider the return time to a state z, 
7? mint > 0 |.X,= 2}. 
then for a positive recurrent chain, 
Bl | Xr) = ae) 


If X,, is null recurrent, then T’ < oo with probability 1, but E (7) = oo. If X, 
is transient, then T’ = oo with positive probability. 

One way to determine whether or not a chain is positive recurrent is to 
try to find an invariant probability distribution. It can be proved that if an 
irreducible chain is positive recurrent, then there exists a unique probability 
distribution satisfying (2.8); moreover, if a chain is not positive recurrent, 
there is no probability distribution satisfying (2.8). This gives a good criterion: 
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try to find an invariant probability distribution. If it exists, then the chain is 
positive recurrent; if none exists, then it is either null recurrent or transient. 


Example. Consider again the example of random walk with partially re- 
flecting boundary. We will try to find a probability distribution that satisfies 
(2.8), ie., a nonnegative function (x) satisfying (2.8) and 


>) (2) = 1. (2.9) 


res 
In this example, (2.8) gives 
me+1)(1-—p)+2r(e2-1)p=n7(2), x>Q0, (2.10) 
r(1) (1 —p) + 7(0)(1 — p) = 2(0). (2.11) 


By (0.5) and (0.6), the general solution to (2.10) is 


m(x) = c, + C2 (2) , pee Ly. 


mx)=c,+ceo2, p=1/2. 


Equation (2.11) gives 7(0) = [(1 — p)/p| 7(1). Plugging this into the above 
gives 


rejna(s2)'. vai 


Tee. pel 2: 


Now we impose the condition (2.9): can we choose the constant c, or C2 so 
that }> a(x) = 1? For p = 1/2, it clearly cannot be done. Suppose p # 1/2. 
Clearly, we would need co 4 0. If p > 1/2, }°[p/(1 — p)|* = co and we cannot 
find such a cg (we already knew the chain was transient in this case, so it 
could not possibly be positive recurrent). However if p < 1/2, the sum is 
finite and we can choose 


= (5) EGS)] -C4)65) 


y=0 


In this case the chain is positive recurrent and this gives the invariant proba- 
bility. Summarizing the discussion in the last two sections we see that random 
walk with partially reflecting boundary is 


positive recurrent if p < 1/2, 
null recurrent if p = 1/2, 
transient if p > 1/2. 
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2.4 Branching Process 


In this section we study a stochastic model for population growth. Consider 
a population of individuals. We let X, denote the number of individuals at 
time n. At each time interval, the population will change according to the 
following rule: each individual will produce a random number of offspring; 
after producing the offspring, the individual dies and leaves the system. We 
make two assumptions about the reproduction process: 


1. Each individual produces offspring with the same probability distribu- 
tion: there are given nonnegative numbers po, 1, p2,... Summing to 1 such 
that the probability that an individual produces exactly k offspring is pp. 


2. The individuals reproduce independently. 


The number of individuals at stage n, X,,, is then a Markov chain with state 
space {0,1,2,...}. Note that 0 is an absorbing state; once the population dies 
out, no individuals can be produced. It is not so easy to write down explicitly 
the transition probabilities for this chain. Suppose that X, = k. Then k 
individuals produce offspring for the (n + 1)st generation. If Yj,...,Y, are 
independent random variables each with distribution P{Y; = j} = p,, then 


p(k, j) = PiXng1 =7 | Xn =k} = PLM +--+ + Ye = J}. 


The actual distribution of Y; +---+ Y, can be expressed in terms of convo- 
lutions, but we will not need the exact form here. Let jz denote the mean 
number of offspring produced by an individual, 


= S- 2 Di. 
i=0 
Then, 
It is relatively straightforward to calculate the mean number of individuals, 


E(Xn), 


E(X,) = SP{ Xn ene eae, eee 2 


= Sok uP{Xn-1 =k} = pE(X,-1). 
k=0 


Or, if we do this n times, 
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Some interesting conclusions can be reached from this expression. If uw < 1, 
then the mean number of offspring goes to 0 as n gets large. The easy estimate 


E(X,) = 5 AP{X, =k}> P(X, =k}=P{X, >} 


can then be used to deduce that the population eventually dies out, 
hm Pix = Ob = 1, 


If « = 1, the expected population size remains constant while for p > 1, the 
expected population size grows. It is not so clear in these cases whether or 
not the population dies out with probability 1. [It is possible for X, to be 0 
with probability very near 1, yet E(X,,) not be small.] Below we investigate 
how to determine the probability that the population dies out. In order to 
avoid trivial cases we will assume that 

po > 0; potpi < l. (2.12) 

Let 
dnlk) = Pix, =0| Xo =k} 

and let a(k) be the probability that the population eventually dies out assum- 
ing that there are k individuals initially, 


a(k) = Jim, an(k). 


If the population has k individuals at a certain time, then the only way for 
the population to die out is for all k branches to die out. Since the branches 
act independently, 


a(k) = [a(1)]”. 


It suffices therefore to determine a(1) which we will denote by just a and 
call the extinction probability. Assume now that Xo = 1. If we look at one 
generation, we get 


a = P{population dies out | Xo = 1} 


= S P{X, =k | Xo = 1} P{population dies out |X, = k} 
k=0 


= S_ Dk a(k) = S_ Pk a 
k=0 k=0 


The quantity on the right is of sufficient interest to give it a name. If X is 
a random variable taking values in {0,1,2,...}, the generating function of X 
is the function 


$(s) = $x(s) =E(s*) = 5° s* PLX = ky}. 


k=0 
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Note that ¢(s) is an increasing function of s for s > 0 with ¢(0) = P{X = 0} 
and ¢(1) = 1. Differentiating, we get 


‘(s) = So kot PLX = k}, 


ie k—1)s*-*P{X = k}. 


k=2 


Hence, 


= 5 eP{X = 8} = E(X), (2.13) 


k=1 
and for s > 0, if P{X > 2} > 0, 
b”(s) > 0. (2.14) 


If X1,... ,Xm are independent random variables taking values in the nonneg- 
ative integers, then 


PX 4---+Xm ($8) = Ox, (5) °+* OX (8): 


The easiest way to see this is to use the expression ¢x(s) = E(s*) and the 
product rule for expectation of independent random variables. 

Returning to the branching process we see that the extinction probability 
a satisfies the equation 


a= (a). 
Clearly, a = 1 satisfies this equation, but there could well be other solutions. 
Again, we assume Xo = 1. Then the generating function of the random 


variable Xo is a and the generating function of X; is ¢(a). Let ¢"(a) be the 
generating function of X,. We will now show that 


$"(a) = o(¢" (a). 


To see this, we first note 


a) = y PL Xn =kta* 


S_P{X1 = j}P{Xn Sk x Sy a 


7=0 
py ¥P{Xp-1 =k | Xo = jh a*. 
k=0 
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Now, if Xo = 7, then X,,_, is the sum of 7 independent random variables 
each with the distribution of X,,_; given Xo = 1. Hence the sum over k is the 
generating function of the sum of 7 independent random variables each with 
generating function ¢"~!(a) and hence 


S > P{Xn-1 =k | Xo = ja® = [g" (a), 
k=0 


and 
o” (a) = > 2; [o"* (a)? = 6(6"" 7 (a)). 
j=0 


We now have a recursive way to find ¢”(a) and hence to find 
an(1) = P{X, =0| Xo = 1} = 6" (0). 


We are now ready to demonstrate the following: the extinction probability 
a is the smallest positive root of the equation a = ¢(a). We have already seen 
that a must satisfy this equation. Let a@ be the smallest positive root. We will 
show by induction that for every n, a, = P{X, = 0} < a (which implies that 
a = lima, < a). This is obviously true for n = 0 since ap = 0. Assume that 
An—1 <a. Then 


P{X, = 0} = $"(0) = $(¢""*(0)) = b(an-1) < 6(4) = G. 
The inequality follows from the fact that @ is an increasing function. 
Example 1. Suppose po = 1/4, p; = 1/4, po = 1/2. Then p = 5/4 and 


i eo he 
Raat { arse 


Solving a = ¢(a) gives the solutions a = 1,1/2. The extinction probability is 
1/2: 


Example 2. Suppose po = 1/2,p; = 1/4, p2 = 1/4. Then pw = 3/4 and 


a en: ae 
VO Sarr a 


Solving a = ¢(a) gives the solutions a = 1,2. The extinction probability is 1. 
(We had already demonstrated this fact since yz < 1.) 


Example 3. Suppose po = 1/4,p; = 1/2, p2 = 1/4. Then uw = 1 and 
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Solving a = ¢(a) gives the solutions a = 1,1. The extinction probability is 1. 


We finish by establishing a criterion to determine whether or not a < lI. 
We have already seen that if uw < 1, then a = 1. Suppose yw = 1. By (2.13), 
¢'(1) = 1 and therefore by (2.14), ¢’(s) < 1 for s < 1. Hence for any s < 1, 


1- (8) = | d'(s)\ds << 1—s, 


i.e., d(s) > s. Therefore, if 4 = 1, the extinction probability is 1. This is 
an interesting result—even though the expected population size stays at 1, 
the probability that the population has died out increases to 1. One corollary 
of this is that the conditional size of the population conditioned that it has 
not died out must increase with time. That is to say, if one is told at some 
large time that the population has not died out, then one would expect the 
population to be large. 

Now assume pz > 1. Then ¢’(1) > 1 and hence there must be some s < 1 
with o(s) < s. But ¢(0) > 0. By standard continuity arguments, we see that 
there must be some a € (0,5) with ¢(a) = a. Since ¢”(s) > 0 for s € (0,1), the 
curve is convex and there can be at most one s € (0,1) with ¢(s) = s. In this 
case, with positive probability the population lives forever. We summarize 
these results as a theorem. 


Theorem. I[f pp < 1 and po > 0, the extinction probability a = 1, 1.e., the 
population eventually dies out. If u > 1, then the extinction probability a < 1 
and equals the unique root of the equation 


t = p(t), 


withO <t< 1. 


2.5 Exercises 


2.1 Consider the queueing model (Example 3 of Section 2.1). For which 
values of p,q is the chain null recurrent, positive recurrent, transient? 
For the positive recurrent case give the limiting probability distribution 7. 
What is the average length of the queue in equilibrium? 
For the transient case, give a(x) = the probability starting at x of ever 
reaching state 0. 


2.2 Consider the following Markov chain with state space S = {0,1,...}. A 
lasts , A - lee) 

sequence of positive numbers pj, p2,... is given with )°);~, p; = 1. Whenever 

the chain reaches state 0 it chooses a new state according to the p;. Whenever 
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the chain is at a state other than 0 it proceeds deterministically, one step at 
a time, toward 0. In other words, the chain has transition probability 


p(xz,x—-1)=1, «>0, 


D0;2): =e. GS 0. 


This is a recurrent chain since the chain keeps returning to 0. Under what 
conditions on the p, is the chain positive recurrent? In this case, what is the 
limiting probability distribution 7? [Hint: it may be easier to compute E (T) 
directly where T is the time of first return to 0 starting at 0.] 


2.3 Consider the Markov chain with state space S = {0,1,2,...} and tran- 
sition probabilities: 


p(x,2+1) = 2/3; pla, 0) = 1/3. 
Show that the chain is positive recurrent and give the limiting probability 7. 


2.4 Consider the Markov chain with state space S = {0,1,2,...} and tran- 
sition probabilities: 


p(z,xr+2)=p, p(z,c—-1)=1—p, z>0. 


For which values of p is this a transient chain? 
2.5 Let X,, be the Markov chain with state space Z and transition probability 


p(n,n +1) =p, pia 1) =H TS 9, 


where p > 1/2. Assume Xo = 0. 

(a) Let Y = min{ Xo, Xj,...}. What is the distribution of Y? 

(b) For positive integer k, let T, = min{n: X,, = k} and let e(k) = E[T;]. 
Explain why e(k) = ke(1). 

(c) Find e(1). (Hint: (b) might be helpful.) 

(d) Use (c) to give another proof that e(1) = 00 if p= 1/2. 


2.6 Suppose J), Jo,... are independent random variables with P{J; = 1} = 
1— P{J; = 0} = p. Let k be a positive integer and let JT, be the first time 
that k consecutive ls have appeared. In other words, Ty =n if Jn = Jn—1 = 
+++ = J,_(n-1) = 1 and there is no m < n such that Im = Jm-1 =°°° = 
Jm—(k—1) = 1. Let Xo = 0 and for n > 0, let X, be the number of consecutive 
ls in the last run, i.e., X, =k if J,_, =O and J; =1lforn-—k<i<n. 
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(a) Explain why X, is a Markov chain with state space {0,1,2,...} and 
give the transition probabilities. 

(b) Show that the chain is irreducible and positive recurrent and give the 
invariant probability 7. 

(c) Find E[7;] by writing an equation for E [7] in terms of E[T,_;] and 
then solving the recursive equation. 

(d) Find E|7;,] is a different way. Suppose the chain starts in state k, and 
let Te be the the time until returning to state k and Ts the time until the 
chain reaches state 0. Explain why 


E [Tq] = E [Zo] + E (Tx, 
find E [Tp], and use part (b) to determine E [7;]. 


2.7 Let X,, be a Markov chain with state space S = {0,1,2,...}. For each of 
the following transition probabilities, state if the chain is positive recurrent, 
null recurrent, or transient. If it is positive recurrent, give the stationary 
probability distribution: 

(a) p(x,0) = ee 2), p(z,e+1) = (x +1)/(x + 2); 
(b) p(v,0) = (e+ 1)/(e+ 2), p(z,e+1) = 1/(e +2), 
(c) p(x, 0) = 1/(2?7 +2), p(z,x+1) = (x? + 1)/(a? + 2). 


2.8 Given a branching process with the following offspring distributions, 
determine the extinction probability a. 

(a) po = .25, pi: = .4, po = -35. 

(b) po = -5,p1 = -1,p3 = -4. 

eae ea area cae O01, p3 = .01, pe = 01, pig = .01. 

(d) p; = (1—q)q’, for some 0 <q <1. 


2.9 Consider the branching process with offspring distribution as in Exercise 
2.8(b) and suppose Xo = 1. 

(a) What is the probability that the population is extinct in the second 
generation (X2 = QO), given that it did not die out in the first generation 
(Xy > 0)? 

(b) What is the probability that the population is extinct in the third 
generation, given that it was not extinct in the second generation? 


2.10 Consider a branching process with offspring distribution given by {p, }. 
We will make the process into an irreducible Markov chain by asserting that 
if the population ever dies out, then the next generation will have one new 
individual [in other words, p(0,1) = 1]. For which {p,} will this chain be 
positive recurrent, null recurrent, transient? 


2.11 Consider the following variation of the branching process. At each time 
n, each individual produces offspring independently using offspring distribu- 
tion {p,,}, and then the individual dies with probability q € (0,1). Hence, 
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each individual reproduces 7 times where 7 is the lifetime of the individual. 
For which values of q, {p,} do we have eventual extinction with probability 
one? 


2.12 Consider the branching process with po = 1/3,p; = 1/3,p2 = 1/3. 
Find, with the aid of a computer, the probability that the population dies out 
after n steps where n = 20, 100, 200, 1000, 1500, 2000, 5000. Do the same with 
values po = .35,p1 = .33, po = .32, and then do it with values po = .32,p; = 
33, D2 = .35. 


2.13 Consider a population of animals with the following rule for (asexual) 
reproduction: an individual that is born has probability q of surviving long 
enough to produce offspring. If the individual does produce offspring, she 
produces one or two offspring, each with equal probability. After this the 
individual no longer reproduces and eventually dies. Suppose the population 
starts with four individuals. 

(a) For which values of q is it guaranteed that the population will eventually 
die out? 
(b) If g = .9, what is the probability that the population survives forever? 


2.14 Let X,, be the number of individuals at time n of a branching process 
with wp > 1. Assume Xo = 1. Let ¢ be the generating function for the offspring 
distribution, and let a < 1 be the extinction probability. 

(a) Explain why ¢’(a) < 1. 
(b) Let a, = P{X, = 0}. Using part (a) show that there is a p < 1 such 
that for all n sufficiently large 


A—An+1 < p(a— an). 
(c) Show that there exist b > 0,c < oo such that for all n, 
P{ extinction |X, 40} <ce~™. 


In other words, if the population is going to go extinct it is very likely to do 
it in the first few generations. 


2.15 Let X,, Xo,... be independent identically distributed random variables 
taking values in the integers with mean 0. Let So = 0 and 


Sy, = X,4+---+ Xp. 


(a) Let 


Galt) =E SIS; = 7} 
j=0 
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be the expected number of visits to x in the first n steps. Show that for all n 
and x, G,,(0) > G,(x). (Hint: consider the first 7 with S; = z.) 
(b) Recall that the law of large numbers implies that for each € > 0, 


lim: P{|S,\:< ne} =. 


Show that this implies that for every « > 0, 


(c) Using (a) and (b), show that for each M < oo there is an n such that 
G,(0) > M. 
(d) Conclude that S, is a recurrent Markov chain. 


2.16 Let p1,p0,p-1,... be a probability distribution on {... ,—2,—1,0,1} 
with negative mean 


y= = 0. 
n 


Define a Markov chain X, on the nonnegative integers with transition prob- 
abilities 


p(n, m) = Pm—n, iO; 


p(n, 0) = >. Pm—-n- 


m<o0 


In other words, X, acts like a random walk with increments given by the 
p;, except that the walk is forbidden to jump below 0. The purpose of this 
exercise is to show that the chain is positive recurrent. 

(a) Let a(n) be an invariant probability for the chain. Show that for each 
n> 0, 


OO 


n(n) = > a(m)Prm- 


m=n—1 
(b) Let gq, = pi_n. Show there exists an a € (0,1) such that 
A=GtNatqQaryt---. 


(Hint: q, is the probability distribution of a random variable with mean 
greater than 1. The right-hand side is the generating function of the q,.) 

(c) Use the @ from (b) to find the invariant probability distribution for the 
chain. 
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2.17 Let p(x, y) be the transition probability for a Markov chain on a state 
space 5. Call a function f superharmonic at x for p if 


S(x,y) f(y) < F(a). 


yES 


Fix a state z € S. 
(a) Let A be the set of all functions f with f(z) =1;0< f(y) < 1 for all 
y € S; and that are superharmonic at all y 4 z. Let g be defined by 


qe) = sat {(e): 


Show that g € A. 
(b) Show that for all « ¥ z, 


S- p(z,y)9(y) = 9(2). 


[Hint: suppose >°,, p(z,y)9(y) < g(x) for some x. Show how you can decrease 
g alittle at x so that the function stays superharmonic.] 
(c) Let g be as in (a). Show that if g(x) < 1 for some z, then 


use 


[Hint: let « = inf, g(x) and consider h(x) = (g(x) — €)/(1 — €).| 

(d) Conclude the following: suppose that an irreducible Markov chain with 
transition probabilities p(x, y) is given and there is a function f that is su- 
perharmonic for p at all y 4 z; f(z) =1;0< f(y) < 1, y € S; and such that 
f(x) < 1 for some x € S. Then the chain is transient. 


2.18 In this exercise, we will establish Stirling’s formula 


maw VIrnrtO/2) em”, (2.15) 
Let X,, X2,... be independent Poisson random variables with mean 1 and let 


Y, = X, +---+ X, which is a Poisson random variable with mean n. Let 
eeedae 
pik) = Ply = kt Se mh 


(a) Use the central limit theorem to show that if a > 0, 
= "il 2 
lim Wok) = Te ae ih) 
n—0o De PI ) / \/ 27 
n<k<nta/n 


(b) Show that if a > 0, n is a positive integer, andn <k<n+av,/n, then 


2 


e ° p(n,n) < p(n,k) < p(n,n). 
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(c) Use (a) and (b) to conclude that 


1 
p(n,n) ~ Jama 


Stirling’s formula (2.15) follows immediately. 
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Chapter 3 


Continuous- Time Markov Chains 


3.1 Poisson Process 


Consider X; the number of customers arriving at a store by time t. Time is 
now continuous so t takes values in the nonnegative real numbers. Suppose we 
make three assumptions about the rate at which customers arrive. Intuitively, 
they are as follows: 


1. The number of customers arriving during one time interval does not 
affect the number arriving during a different time interval. 


2. The “average” rate at which customers arrive remains constant. 
3. Customers arrive one at a time. 


We now make these assumptions mathematically precise. The first assump- 
tion is easy: for 8s; < t; < sg < to < --- < Ss, < t,, the random variables 
X+, — Xs,,---Xt, — Xs, are independent. For the second two assumptions, 
let A be the rate at which customers arrive, i.e., on the average we expect At 
customers in time t. In a small time interval [t,t + At], we expect that a new 
customer arrives with probability about AAt. The third assumption states 
that the probability that more than one customer comes in during a small 
time interval is significantly smaller than this. Rigorously, this becomes 


P{Xisrat = X;} =1—)At ae o(At), (3.1) 


Here o(At) represents some function that is much smaller than At for At 
small, i.e., 
o( At 
te oe? 
Ato At 
A stochastic process X; with Xo = 0 satisfying these assumptions is called a 
Poisson process with rate parameter . 


= 0. 


69 


66 Introduction to Stochastic Processes 


We will now determine the distribution of X;. We will actually derive the 
distribution in two different ways. First, consider a large number n and write 


nr 


Xt = SS LXje/n Ga eal (3.4) 


7=1 


We have written X; as the sum of n independent, identically distributed 
random variables. If n is large, the probability that any of these random 
variables is 2 or more is small; in fact, 


P{Xjt/n — X(j-1)t/n = 2 for some 7 <n} 


IA 
te 
pa, 


Ke NG sii ee et 


The last term goes to 0 as n — oo by (3.3). Hence we can approximate the 
sum in (3.4) by a sum of independent random variables which equal 1 with 
probability A(t/n) and 0 with probability 1 — A(t/n). By the formula for the 
binomial distribution, 


rowen=()(2) (0-3) 


Rigorously, we can then show: 


rc) (8) (3) 


To take this limit, note that 


ey pases ce 1 1 
lim ere lim ses ie ae Jee ES Eat Ves — 


TN — CO 


and 


ees 
P{X; =i} =€E re ca w ; 


i.e., X; has a Poisson distribution with parameter At. 
We now derive this formula in a different way. Let 


P,(t) = P{X;, = k}. 
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Note that P9(0) = 1 and P,(0) = 0, k > 0. Equations (3.1) through (3.3) can 
be used to give a system of differential equations for P,(t). The definition of 
the derivative gives 


PA(#) = Jim (P{Xesane = bY - PLXe = bi). 
Note that 
P{Xitat =k} = P{LXt =k} P{LXtpar =k | Xt = k} 
+ P{X,=k-1}P{Xia: =k |X =k - 1} 
+ P{X, <k—-—2}$P{Xitar =k | Xt < k - 2} 
= P,(t) (1 — AAt) + Pr_1(t) AAt + o( At). 
Therefore, 
Py(t) = APp-i(t) — APy(t). 
We can solve these equations recursively. For k = 0, the differential equation 
Po(t) = —APo(t), Po(0) = 1 
has the solution 
Po(t) =e". 
To solve for k > 0 it is convenient to consider 
f(t) = e* P(t). 
Then fo(t) = 1 and the differential equation becomes 


f(t) =Afe-r(t),  fe(0) = 0. 
It is then easy to check inductively that the solution is 
(i ee aoe 
and hence 


k 
P,, (€) = pul, 
which is what we derived previously. 

Another way to view the Poisson process is to consider the waiting times 
between customers. Let T,,,n = 1,2,... be the time between the arrivals of 
the (n — 1)st and nth customers. Let Y,, = 7; +---+ 7, be the total amount 
of time until n customers arrive. We can write 


Vie ih =}, 
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Dy = Ye = Vs 


Here inf stands for “infimum” or least upper bound which is the generalization 
of minimum for infinite sets; e.g., the infimum of the set of positive numbers 
is 0. The T; should be independent, identically distributed random variables. 
One property that the T; should satisfy is the loss of memory property: if we 
have waited s time units for a customer and no one has arrived, the chance 
that a customer will come in the next ¢ time units is exactly the same as 
if there had been some customers before. Mathematically, this property is 
written 


P{T, >s+t|T, >s} =P{T; = ¢}. 


The only real-valued functions satisfying f(s +t) = f(s) f(t) are of the form 
f(t) = e~™. Hence the distribution of T; must be an exponential distribu- 
tion with parameter b. [Recall that a random variable Z has an exponential 
distribution with rate parameter 06 if it has density 


f(z) =be-”, 0<z< 00, 
or equivalently, if it has distribution function 
F(z) =P{Z<z}=1-e™, z>0. 


An easy calculation gives E(Z) = 1/b.] It is easy to see what b should be. 
For large t values we expect for there to be about At customers. Hence, 
Yy. = t. But Y¥, ~ nE(T;) = n/b. Hence \ = b. This gives a means of 
constructing a Poisson process: take independent random variables 7), 72,..., 
each exponential with rate A, and define 


b eae Ae a a ae 


Xian, if Y, <t < Yair. 


From this we could then conclude in a third way that the random variables 
X, have a Poisson distribution. Conversely, given that we already have the 
Poisson process, it is easy to compute the distribution of T; since 


P{T, >t} =P{X,=0} =e. 


3.2 Finite State Space 


In this section we discuss continuous-time Markov chains on a finite state 
space. We start by discussing some facts about exponential random variables. 
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Suppose 7,... , 7, are independent random variables, each exponential with 
rates b,,... ,6,, respectively. Intuitively, we can think of n alarm clocks which 
will go off at times 7),... ,7;,. Consider the first time when any of the alarm 


clocks goes off; more precisely, consider the random variable 
(ip asieew epee oon 
Note that 


P{T>t}=P{T%>t,...,Tr2>t} 
= P{T, >t} P{T >t} --- P{T >t} 


= eit o— bat Pert ent = eo (bite +bn )t- 


In other words, T has an exponential distribution with parameter 6; +---+0,. 
Moreover, it is easy to give the probabilities for which of the clocks goes off 
first, 


P{T, =T} =<} P{T> > t,...,T, >t} dP{T; =} 
0 


CO 
= / e (batten jig e—Pit gy 
0 
by tee tn 

In other words, the probability that the ith clock goes off first is the ratio 
of 6; to 6; +---+ 6b,. If we are given an infinite sequence of exponential 
random variables 7), 7>,..., with parameters 6;, b2,..., the same result holds 
provided that 6; + b2 +--+ < oo. 

Suppose now that we have a finite state space S. We will define a continuous- 
time process X; on S that has the Markov property, 


PiAge=y | X05 ¢ < sf} = PLXe=— 9 | XG}; 
and that is time-homogeneous, 
ea a, ei ee |, © a pe og 


For each z,y € S,x # y we assign a nonnegative number a(z, y) that we think 
of as the rate at which the chain changes from state x to state y. We let a(z) 
denote the total rate at which the chain is changing from state 7, i.e., 


a(r) = )/ a(z,y). 
YFX 


A (time-homogeneous) continuous-time Markov chain with rates a is a stochas- 
tic process X; taking values in S' satisfying 


P{Xitar =x |X, =x} =1-a(xz)At + o(At), (3.5) 
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P{Xizae = €| Xe = yf =aly,x)At+o(At), yFu. (3.6) 


In other words, the probability that the chain in state y jumps to a different 
state x in a small time interval of length At is about a(y, x) At. For the Poisson 
process, we used the description for small At to write differential equations for 
the probabilities. We do the same in this case. If we let p,(t) = P{X; = x}, 
then the equations above can be shown to give a system of linear differential 
equations, 


pi,(t) = —a(x)pe(t) + d— ay, x)py(t). 


YAL 


If we impose an initial condition, p,(0),x € S, then we can solve the system. 
This system is often written in matrix form. Let A be the matrix whose (z, y) 
entry equals a(z,y) if x 4 y and equals —a(z) if x = y. Then if p(t) denotes 
the vector of probabilities, the system can be written 


p(t) = p(t)A. (3.7) 


The matrix A is called the infinitesimal generator of the chain. Note that the 
row sums of A equal 0, the nondiagonal entries of A are nonnegative, and 
the diagonal entries are nonpositive. From differential equations (see Section 
0.2), we can give the solution 


We can also write this in terms of transition matrices. Let p;(x, y) = P{X; = 
y | Xo = x} and let P; be the matrix whose (z,y) entry is p:(z,y). The 
system of differential equations can be written as a single matrix equation: 


d 
7Pe=PiA, Po=l (3.8) 


The matrix P; is then given by 
P, = eo. 
Example 1. Consider a chain with two states—0O, 1. Assume a(0,1) = 1 and 
a(1,0) = 2. Then the infinitesimal generator is 
0 1 


o|—-1 1 
A=115 a 


A we diagonalize the matrix. The eigenvalues are 


In order to compute e° 
0,—3. We can write 


D=Q7'AQ, 
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_ {0 0 ey ee: _, {2/3 1/38 
y= he? Q= el = eaves 
We use the diagonalization to compute the exponential e. 


P, = etA = S- (tA) 


n! 


rs | 


293 4 ee | 1/3 7 | 


2/3 1/3 —2/3 2/3 
Note that 
lim P; = H ) 
t— co T 


where 7 = (2/3, 1/3). 


Example 2. Consider a chain with four states—0O, 1, 2,3—and infinitesimal 
generator 


0 1 2 8 

of —-1 1 O O 

1} 1 -—3 1 1 

an 2} 0 1-21 
3} O 1 1 —-2 


The eigenvalues of A are 0,—1,—3,—4 with right eigenvectors (which are 
left eigenvectors as well since A is symmetric) (1,1,1,1), (1,0, —1/2, -1/2), 
(0,0, —1/2, 1/2), and (—1/3, 1, -1/3, 1/3). Then, 


D=Q ‘AQ, 
where 


0 i -t- 02178 
0 J 0 a 
91% 4 eto 9.073, 
4 L172: 4/2213 


1/41/4 1/4 1/4 
_,_ | 2/3 0-1/3 -1/3 
= 0 0 -1 1 
~1/43/4 -1/4 -1/4 
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Therefore, 


P, = etA = Qe'PQ™! = 


1/41/41/41/4 2301 3S1/3 
1/40/4i/ai/a| | 00 oO 0 
1/41/41/41/4| "© |-1/30 1/6 1/6 
1/41/41/41/4 ~1/30 1/6 1/6 
00 oO O 1/12 —1/4 1/12 1/12 
en [90 0 Of] | a | 1/4 3/4 -1/4 -1/4 
00 1/2-1/2 1/12 =1/4-1/12 “1/12 
00-12 1/2 1/12 -1/4 1/12 1/12 
Note that 


1/41/41/41/4 
a ct eA 
isco VA A/a 
1/41/41/41/4 


We can use exponential waiting times to give an alternative description of 
the Markov chain. Suppose rates a(z, y) have been given. Suppose Xo = @. 
Let 


PS Mitta eee |; 


i.e., J’ is the time at which the process first changes state. The Markov 
property can be used to see that 7’ must have the loss of memory property, 
and hence T’ must have an exponential distribution. By (3.5), 


P{T < At} = a(x) At + 0(At). 


In order for this to be true, T must be exponential with parameter a(z). 
What state does the chain move to? The infinitesimal characterization (3.6) 
can be used to check that the probability that the state changes to y is exactly 
a(z,y)/a(xz). By the discussion of exponential distributions above we can 
think of this in another way. Independent “alarm clocks” are placed at each 
state y, with each alarm going off at an exponential time with rate a(z, y). 
The chain stays in state x until the first such clock goes off and then it moves 
to the state corresponding to that clock. 

As in the case for discrete time, we are interested in the large-time behavior. 
As Examples 1 and 2 in this section demonstrate we expect 


ue 


Jim P; _ Il, = 


Tl eee 
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where 7 represents a limiting probability. The limiting probability should not 
change with time; hence, by (3.7), 


TA = 0. 


In this case, 7 is an eigenvector of A with eigenvalue 0. The limit the- 
ory now parallels that for discrete time. Suppose for ease that the chain is 
irreducible. [A continuous-time Markov chain is irreducible if all states com- 
municate, i.e., for each z,y € S, there exist z1,...,2; € S with a(z, z,), 
(21, 22),... ,a(2;-1, 2), a(z;,y) all strictly positive.] In this case, one can 
show (see Exercise 3.4) using the results for stochastic matrices that: 


1. There is a unique probability vector 7 satisfying 


tA = 0. 


2. All other eigenvalues of A have negative real part. 


By analyzing the matrix differential equation it is not too difficult to show 
that 


lim P, = 
t—0o 


Tt 


If the chain is reducible, we must analyze the chain on each communication 
class. We have not discussed periodicity. This phenomenon does not occur 
for continuous-time chains; in fact, one can prove (see Exercise 3.7) that for 
any irreducible continuous-time chain, P; has strictly positive entries for all 
t > 0. 

A number of the methods for analyzing discrete-time chains have analogues 
for continuous-time chains. Suppose X; is an irreducible continuous-time 
chain on finite state space S and suppose z is some fixed state in S. We will 
compute the mean passage time to z starting at state z, i.e., b(x) = E(Y | 
Xo = x), where 


Y = inf{t: X; =z}. 


Clearly, b(z) = 0. For x 4 z, assume Xo = x and let T be the first time that 
the chain changes state as above. ‘Then 


E(Y | Xp =2) =E(T| Xo =2)+ ) P{Xr =y| Xo =a} E(Y | Xo =y). 


yes 


Since T' is exponential with parameter a(x) the first term on the right hand 
side equals 1/a(xz). Also from the above discussion, P{X; = y | Xo = x} = 
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a(z,y)/a(x). Finally, since b(z) = 0, we do not need to include the y = z 
term in the sum. Therefore, the equation becomes 


a(x) b(z)=1+ S> a(a,y) bly). 
YFL,zZ 


If we let A be the matrix obtained from A by deleting the row and column 
associated to the state z, we get the matrix equation 


Or 


(The matrix A is a square matrix whose row sums are all nonpositive and at 
least one of whose row sums is strictly negative. From this one can conclude 
that all the eigenvalues of A have strictly negative real part, and hence A is 
invertible. ) 


Example 3. Consider Example 2 in this section and let us compute the 
expected time to get from state 0 to state 3. Then z = 3, 


0 1 2 

oj -1 1 O 

A= 7} 1-231 
2} 0 1 —-2 


and 
b = [—A]~11 = (8/3, 5/3, 4/3). 


Therefore the expected time to get from state 0 to state 3 is 8/3. 


3.3. Birth-and-Death Processes 


In this section we consider a large class of infinite state space, continuous- 
time Markov chains that are known by the name of birth-and-death processes. 
The state space will be {0,1,2,...}, and changes of state will always be from n 
ton+lornton—1. Intuitively we can view the state of the system as the size 
of a population that can increase or decrease by 1 by a “birth” or a “death,” 
respectively. To describe the chain, we give birth rates A,,n = 0,1,2,... 
and death rates un,n = 1,2,3,.... If the population is currently n, then 
new individuals arrive at rate \, and individuals leave at rate 1, (note if the 
population is 0 there can be no deaths, so Uo = 0). 
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If we let X; denote the state of the chain at time t, then 


P{Xisar =n| Xe =n} = 1 — (un + An) At + O(At), 
P{Xtrat =n+1l | Xt = n} = A, At + o( At), 


P{ Xt+at =>n-1 | Xt = n} = [Ly At = o(At). 


As before, we can convert these equations into differential equations for P, (t) = 
P{X; = n} and get the system 


P(t) = Un+1Pngi(t) + An—1Pa—1(t) — (un + An) Pn(t). (3.9) 
To compute the transition probabilities 
pi(m,n) = P{X, =n | Xo = m} 
we need only solve the system with initial conditions, 


P,,(0)=1, P,(0)=0, n&ém. 


Example 1. The Poisson process with rate parameter A is a birth-and-death 
process with A, = A and Ly = 0. 


Example 2. Markovian Queueing Models. Suppose xX; denotes the 
number of people on line for some service. We assume that people arrive at a 
rate \; more precisely, the arrival rate of customers follows a Poisson process 
with rate 4. Customers are also serviced at an exponential rate u. We note 
three different service rules: 


(a) M/M/1 queue. In this case there is one server and only the first person 
in line is being serviced. This gives a birth-and-death process with A, = A 
and fn = b(n > 1). The two Ms in the notation refer to the fact that 
both the arrival and the service times are exponential and hence the process 
is Markovian. The 1 denotes the fact that there is one server. 


(b) M/M/k queue. In this case there are k servers and anyone in the first 
k positions in the line can be served. If there are k people being served, and 
each one is served at rate yz, then the rate at which people are leaving the 
system is ky. This gives a birth-and-death process with A, = » and 


_ Jnp,iln<k, 
Pn) ku, ifn >k. 


(c) M/M/oo queue. In this case there are an infinite number of servers, 
so everyone in line has a chance of being served. In this case A, = A and 


ee aye 
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Example 3. Population Model. Imagine that the state of the chain repre- 
sents the number of individuals in a population. Each individual at a certain 
rate A produces another individual. Similarly each individual dies at rate p. 
If all the individuals act independently this can be modelled by a birth-and- 
death process with A, = nA and pn = np. Note that 0 is an absorbing state 
in this model. When p = 0, this is sometimes called the Yule process. 


Example 4. Population Model with Immigration. Assume that indi- 
viduals die and reproduce with rates ys and 4, respectively, as in the previous 
model. We also assume that new individuals arrive at a constant rate v. This 
gives a birth-and-death process with A, = nA + vy and py = np. 


Example 5. Fast-Growing Population Model. Imagine that a popula- 
tion grows at a rate proportional to the square of the number of individuals. 
Then if we assume no deaths, we have a process with \, = n?A and py, = 0. 
The population in this case grows very fast, and we will see later that it 
actually reaches an “infinite population” in finite time. 


We will look more closely at all of these examples, but first we develop some 
general theory. We call the birth-and-death chain irreducible if all the states 
communicate. It is not very difficult to see that this happens if and only if 
all the An, (n > 0) and all the py, (n > 1) are positive. An irreducible chain 
is recurrent if one always returns to a state; otherwise, it is called transient. 
For any birth-and-death process, there is a discrete-time Markov chain on 
{0,1,2,...} that follows the continuous-time chain “when it moves.” It has 
transition probabilities 


i 
Lin + An 


Ln 

p(njn—1)= a ee p(nsn+1)= 
One can check that the continuous-time chain is recurrent if and only if the 
corresponding discrete-time chain is recurrent. Let a(n) be the probability 
that the chain starting at state n ever reaches state 0. Note that a(0) = 1 and 
the value of a(n) is the same whether one considers the continuous-time or 
the discrete-time chain. From our discussion of discrete-time chains, we see 
that a(n) satisfies 


a(n)(tn + An) = a(n —1) pn ta(nt+l)rAn, n> 0. (3.10) 


If the chain is transient, a(n) — 0 as n — oo. If the chain is recurrent, no 
solution of this equation will exist with a(0) = 1,0 < a(n) < l,a(n) ~O0(n > 
00). 

We now give a necessary and sufficient condition for a birth-and-death chain 
to be transient. We will try to find the function a(n). Equation (3.10) can be 
rewritten 
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If we continue, we get 


Hence, 


where the 7 = 0 term of the sum equals 1 by convention. We can find a 
nontrivial solution if the sum converges. We have established the following. 


Fact. The birth-and-death chain is transient if and only if 


a 00: (3.11) 


As an example, consider the queueing models (Example 2). For the M/M/1 
queue, 


Coaster 


which converges if and only if ~ < A. Consider now the M/M/k queue. For 
any n> k, 

Mists _ kl ku - 

Ares*An kF XX 


Therefore, in this case the sum is finite and the chain is transient if and only 
if ku < XA. Finally for the M/M/oo queue, 


Hence, for all values of and A the chain is recurrent. These three results can 
be summarized by saying that the queueing models are transient (and hence 
the lines grow longer and longer) if and only if the (maximal) service rate is 
strictly less than the arrival rate. 
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For recurrent chains, there may or may not be a limiting probability. Again, 
we call an irreducible chain positive recurrent if there exists a probability 
distribution 7(n) such that 


Jim P{X; =n | Xo =m} =7(n). 


for all states m. Otherwise a recurrent chain is called null recurrent. If the 
system is in the limiting probability, i.e., if P,(t) = a(n), where P,(¢) is as in 
(3.9), then P’(t) should equal 0. In other words 7 should satisfy 

0 = An—-17(n — 1) + Ungi m(n +1) — (An + bn) a(n). (3.12) 


Again, as for the case of discrete-time chains, we can find 7 by solving these 
equations. If we can find a probability distribution that satisfies (3.12), then 
the chain is positive recurrent and that distribution is the unique equilibrium 
distribution. 

We can solve (3.12) directly. First, the equation for n = 0 gives 


_», 
Tl) = FF (0). 


For n > 1, the equation can be written 
Un41 (n+ 1) — An T(N) = Un 7(N) — An_i a(n — 1). 
If we iterate this equation, we get 
Un+i 7(n +1) — An wW(n) = 1 71) — Ao 7(0) = O. 
Hence, m(n + 1) = (An/Mn41) 7(n), and by iterating we get the solution 


pee eee 
r(n) = ——————_7 (0). 
in) L1-** En (0) 


We now impose the condition that 7 be a probability measure. We can arrange 
this if and only if 5) a(x) < oo. We have established the following. 


Fact. A birth-and-death chain is positive recurrent if and only if 


(by convention, the n = 0 term in this sum is equal to 1). In this case the 
invariant probability is given by 


TC eT (3.13) 
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As an example, consider the queueing models again. For the M/M/1 queue, 


or And ae A ~1 
» Ha --- Un => (5) =(1 4 | 


provided » < p and is infinite otherwise. Hence this chain is positive recurrent 
for A < pin which case the equilibrium distribution is 


oJ) 


Note that the expected length of the queue in equilibrium is 


Sw Be) CY 20-2) es 


n=0 


In particular, the expected length gets large as \ approaches p. In the case 
of the M/M/k queue, the exact form of 7 is a little messy, but it is easy to 
verify that the chain is positive recurrent if and only if \ < ky. Finally for 
the M/M/oo queue, 


yy ent (2) = om 
fom — a 
n=0 M1 bn R20 M 


Hence, the chain is positive recurrent for all A, ~ and has equilibrium distri- 
bution 


ng (ALE). 
mn) =e a/u (lH) se 


i.e., the equilibrium distribution is a Poisson distribution with parameter X/,. 
The mean queue length in equilibrium is A/p. 

Conditions under which the population models are positive recurrent, null 
recurrent, or transient are discussed in Exercises 3.12 and 3.13. 

We finish by considering two pure birth processes. A birth-and-death pro- 
cess is a pure birth process if u, = 0 for all n. We first consider the Yule 
process with ,, = nA. Let us assume that the population starts with one indi- 
vidual; hence, P;(0) = 1, P,(t) = 0 (n > 1), where again P,,(t) = P{X; = n}. 
The P,,(t)s satisfy the differential equations 


Pl(t) = (n—1)APy-1(t) —ndAP,(t), n> 1. 


One can solve these equations recursively, but since the computations are a 
little messy, we will skip them and simply state that the solution is 


Piao ae. a 
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(It is not too difficult to verify that P,,(t) defined as above does satisfy these 
equations.) The form for P,,(t) is nice; in fact, for a fixed t, X; has a geometric 
distribution with parameter p = e~**. This allows us immediately to compute 
the expected population size at time ft, 


E(X;) = 4 nPy(t) = e™. 
n=1 
We could derive this last result in a different way. Let f(t) = E(X;). Then 


PO => nPi(é) = >on [(n- AP) — nAP, (0) 


= S-ndPr(t) = Af (t). 


Therefore, f(t) satisfies the standard equation for exponential growth and the 
initial condition f(0) = 1 immediately gives the solution f(t) = e**. There is 
one other way we can look at the Yule process. Consider the time Y, when 
the population first reaches n, 1.e., 


Yn = inf{t: X; =n}. 


Then Y, = 7, +---+7,,—~1, where 7; measures the time between the arrival of 
the ith and (¢+1)st individual. The random variables T; are independent and 
T; has an exponential distribution with parameter i\. In particular E(T7;) = 
1/(t\) and Var(T;) = 1/(iA)?. Therefore, 


Also Var(Yn) < S232, (iA)7? < co. Hence, Y, equals Inn/X up to a small 
random error which is bounded as n gets large. If it takes time Inn/A to reach 
a population of n individuals, then in time t we would expect e** individuals. 
Now consider the fast-growing population model, Example 5, with A, = 
n’?\. Again let us consider Y, the time until the nth individual enters the 
population. In this case, an interesting phenomenon occurs. Consider 


Yoo = 7, +72+734+---. 
Then 


In particular, with probability 1, Y.. < oo! This says that in finite time 
the population grows to an infinite size. This phenomenon is often called 
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explosion. For a pure birth process, explosion occurs if and only if E(Y.) < 
oo, 1.e., if and only if 


OO 
S- XO OO, 
n=1 


3.4 General Case 


Suppose we have a countable (perhaps infinite) state space S and rates 
a(x, y) denoting the rate at which the state is changing from x to y. Suppose 
for each 2, 


ae) = N° a(z,y) <_ 00. 


YFL 


Then we can use the “exponential alarm clocks” at each state in order to 
construct a time-homogeneous, continuous-time Markov chain X; such that 
for each x F y, 


P{Xi¢at =y | Xt = 2} = a(z, y)At + o0,(At). 


Here we write o,(-) to show that the size of the error term can depend on the 
state x. If the rates a are not bounded, it is possible for the chain to have 
explosion in finite time as was seen in the case of the fast-growing population 
model in Section 3.3. Let us assume for the time being that we have a chain for 
which explosion does not occur (it is sometimes difficult to determine whether 
or not explosion occurs). 

We will consider the transition probabilities 


pi(z,y) =P{X: =y | X0 =z} = Pl X45 =y| Xs = sh. 


To derive a differential equation for the transition probabilities in the same 
manner as in the previous sections, we write 


Diar(z,y) = pela, y)par(ysy) + >_ pela, 2)par(z,y) 
z#y 
= pi(z,y)[1 — a(y)At + 0, (At) 
+ Bea; z)[a(z, y)At + 0,(At)| 
zH#y 


= pi(2, y)(1 —- a(y)At] ig S- p(x, z)alz, y)At 
zy 


+ S- pi(z, z)o,(At). 
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If we can combine the last error term so that 


>_ r(x, z)0z(At) = o( At), (3.14) 


then we can conclude that the transition probabilities satisfy the system of 
equations 


p,(x, y) a —a(y)pe(x, y) 1 S- az, y)pe(2, 2), 
zFy 


where the derivative is with respect to time. These are sometimes called 
the forward equations for the chain. In most cases of interest, including all 
the examples in the first three sections, (3.14) can be justified. There are 
examples, however, where the forward equations cannot be justified. 

There is another set of equations called the backward equations which always 
hold. For the backward equations we write 


Pe+at(L, y) = De Pac(2, z)pe(z,y) 


= S"fa(zx, z)At + 02(At)]pe(z, y) 
ae 


+ [1 — a(x)At + 0,(At)|p: (az, y). 


The error term depends only on xz. With a little work one can show that one 
can always take the limit as At goes to 0 and get 


h(x, y) = —o(a)pi(x,y) + S> ala, 2)pi(2,4). 
ZF E 


In the case of a finite state space with infinitesimal generator A, the back- 
ward equations for the transition matrix P; becomes in matrix form 


d 
—P,=AP 
dt t i) 


which can be compared to the forward equation (3.8). Both equations (with 
initial condition Pp = I) have the solution 


P, — ee, 


3.5 Exercises 


3.1 Suppose that the number of calls per hour arriving at an answering 
service follows a Poisson process with A = 4. 
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(a) What is the probability that fewer than two calls come in the first hour? 

(b) Suppose that six calls arrive in the first hour. What is the probability 
that at least two calls will arrive in the second hour? 

(c) The person answering the phones waits until fifteen phone calls have 
arrived before going to lunch. What is the expected amount of time that the 
person will wait? 

(d) Suppose it is known that exactly eight calls arrived in the first two 
hours. What is the probability that exactly five of them arrived in the first 
hour? 

(e) Suppose it is known that exactly k calls arrived in the first four hours. 
What is the probability that exactly 7 of them arrived in the first hour? 


3.2 Let X; and Y; be two independent Poisson processes with rate param- 
eters A; and Ag, respectively, measuring the number of customers arriving in 
stores 1 and 2, respectively. 

(a) What is the probability that a customer arrives in store 1 before any 
customers arrive in store 2” 

(b) What is the probability that in the first hour, a total of exactly four 
customers have arrived at the two stores? 

(c) Given that exactly four customers have arrived at the two stores, what 
is the probability that all four went to store 1? 

(d) Let T denote the time of arrival of the first customer at store 2. Then 
X7 is the number of customers in store 1 at the time of the first customer 
arrival at store 2. Find the probability distribution of X7 (i.e., for each k, 
find P{ Xr = k}). 


3.3 Suppose X; and Y; are independent Poisson processes with parameters 
A; and Ag, respectively, measuring the number of calls arriving at two different 
phones. Let Z; = X; + Y;. 

(a) Show that Z; is a Poisson process. What is the rate parameter for Z? 

(b) What is the probability that the first call comes on the first phone? 

(c) Let T denote the first time that at least one call has come from each 
of the two phones. Find the density and distribution function of the random 
variable T’. 


3.4 Let A be the infinitesimal generator for an irreducible, continuous-time 
Markov chain with finite state space. Then the rows of A add up to 0 and 
the nondiagonal elements of A are nonnegative. 

(a) Let a be some positive number greater than all the entries of A. Let 
P = (1/a)A +I. Show that P is the transition matrix for a discrete-time, 
irreducible, aperiodic Markov chain. 

(b) Use this to conclude: A has a unique left eigenvector with eigenvalue 0 
that is a probability vector and all the other eigenvalues of A have real part 
strictly less than 0. 
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3.5 Let X; be a Markov chain with state space {1,2} and rates a(1,2) = 
1,a(2,1) =4. Find P,. 


3.6 Repeat Exercise 3.5 with state space {1, 2,3} and rates a(1,2) = 1, a(2, 1) 
4,0(2,3) =1,a(3,2) = 4,a(1,3) = 0,a(3, 1) = 0. 


3.7 Let X; be an irreducible, continuous-time Markov chain. Show that for 
each 7,7 and every t > 0, 


P{X,=j | Xo =i} >0. 


3.8 Consider the continuous-time Markov chain with state space {1, 2,3, 4} 
and infinitesimal generator 


tf 22° <3 A 

1{!-3 1 1 1 

2; 0-3 2 1 

cia ce a a ee | 
4, 0 O 1 —-!1 


(a) Find the equilibrium distribution 7. 

(b) Suppose the chain starts in state 1. What is the expected amount of 
time until it changes state for the first time? 

(c) Again assume the chain starts in state 1. What is the expected amount 
of time until the chain is in state 4? 


3.9 Repeat Exercise 3.8 with 


t 2 6: -4 

1/—2 1 1 O 

2} 0 —-1 1 O 

ve 3; 1 1-3 1 
4 0 O 1 —-1 


3.10 Suppose a gives the rates for an irreducible continuous-time Markov 
chain on a finite state space. Suppose the invariant probability measure is 7. 
Let 


p(x, y) = a(x, y)/a(z), TFY, 


be the transition probability for the discrete-time Markov chain corresponding 
to the continuous-time chain “when it moves.” Find the invariant probability 
for the discrete-time chain in terms of 7 and a. 


3.11 Let X, be a continuous-time birth-and-death process with birth rate 
An = 1+(1/(n+1)) and death rate uw, = 1. Is this process positive recurrent, 
null recurrent, or transient? What if A, = 1 — (1/(n + 2))? 
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3.12 Consider the population model (Example 3, Section 3.3). For which 
values of ys and A is extinction certain, 1.e., when is the probability of reaching 
state 0 equal to 1? 


3.13 Consider the population model with immigration (Example 4, Section 
3.3). For which values of yu, A, v is the chain positive recurrent, null recurrent, 
transient? 


3.14 Consider a birth-and-death process with 4, = 1/(n +1) and p, = 1. 
Show that the process is positive recurrent and give the stationary distribu- 
tion. 


3.15 Suppose one has a deterministic model for population where the popu- 
lation grows proportionately to the square of the current population. In other 
words, the population p(t) satisfies the differential equation 


oe = clplt)?, 


for some constant c > 0. Assume p(0) = 1. Solve this differential equation 
(by separation of variables) and describe what happens as time increases. 


3.16 Consider a birth-and-death process with birth rates 4,, and death rates 
[in. What are the backward equations for the transition probabilities p;(m,n)? 
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Optimal Stopping 


4.1 Optimal Stopping of Markov Chains 


Imagine the following simple game. A player rolls a die. If the player rolls 
a 6 the player wins no money. Otherwise, the player may either quit the game 
and win k dollars, where k is the roll of the die, or may roll again. If the 
player rolls again, the game continues until either a 6 is rolled or the player 
quits. The total payoff for the game is always k dollars, where k is the value 
of the last roll (unless the roll is a 6 in which case the payoff is 0). What is 
the optimal strategy for the player? 

In order to determine the optimal strategy, it is necessary to decide what 
should be optimized. For example, if the player only wants to guarantee that 
the payoff is positive, then the game should be stopped after the first roll— 
either the player has already lost (if a 6 is rolled) or the player can guarantee 
a positive payoff by stopping. However, it is reasonable to consider what 
happens if the player decides to maximize the expected payoff. Let us analyze 
this problem and then show how this applies to more general Markov chain 
problems. 

We first let f(k) denote the payoff associated with each roll. In this example 
f(k) =k ifk <5 and f(6) =0. We let vu(k) be the expected winnings of the 
player given that the first roll is k assuming that the player takes the optimal 
strategy. At this moment we may not know what the optimal strategy is, but 
it still makes sense to discuss v. We will, in fact, write down an equation that 
uv satisfies and use this to determine v and the optimal strategy. We first note 
that v(6) = 0 and v(5) = 5. The latter is true since it clearly does not pay to 
roll again if the first roll is 5, so the optimal strategy is to stop and pick up 
$5. It is not so clear what vu(k) is for k < 4. 

Now let u(k),k < 5 be the amount of payoff that is expected if the player 
does not stop after rolling a k, but from then on plays according to the optimal 
strategy. [In this particular example, u(k) is actually the same for all k.] Then 
it is easy to see that 


PN (1) + =0(2) " 50(3) + -0(4) " =0(6) i, (6). 


We now can write the optimal strategy in terms of u(k)—if f(k) > u(k), the 
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player should stop and take the money; if f(k) < u(k), the player should roll 
again. In other words, 


u(k) = max{ f(k), u(k)}. 


In particular, v(k) > f(k). This fact implies that u(k) > (f(1) +---+ 
f(6))/6 = 5/2. We now know more about the optimal strategy—if the first 
roll is a 1 or a 2 the player should roll again. Hence, 

v(1) +---+ (6) = v(1) +--+: + (4) 


5 
1 SE — 
v1) 6 6 + & 


Suppose the first roll is a 4. Suppose that the optimal strategy were to 
continue playing. Then clearly that would also be the optimal strategy if the 
first roll is a 3. Under this strategy, the game would continue until a 5 or a 6 is 
rolled and each of these ending rolls would be equally likely. This would give 
an expected payoff of (5+0)/2 = 5/2, which is less than 4. Hence this cannot 
be the optimal strategy starting with a 4. The player, therefore, should stop 
with a 4 and v(4) = f(4) = 4. We finally consider what happens if the first 
roll is a 3. Suppose the player rolls again whenever a 3 comes up and uses 


the optimal strategy otherwise. Let u be the expected winnings in this case. 
Then 


1 1 1 1 1 
Solving for u we get u = 9/3. Since this equals f(3), the expected payoff for 
playing is the same as for stopping and v(3) = 3. With these values, we can 
solve for v(1) and v(2), getting v(1) = v(2) = 3. The optimal strategy is to 
play if the first roll is 1 or 2; stop if the first roll is 4,5,6; and either play or 
stop if the first roll is a 3. 

We now generalize these ideas. Suppose P is the transition matrix for a 
discrete-time Markov chain X,, with state space S. For ease we will assume 
that S' is finite, but much of what follows can be applied to the infinite state 
space case. Assume there is a payoff function f that assigns to each state the 
payoff if the chain is stopped when it reaches that state. In cases of interest, 
P will not be irreducible since otherwise one could always continue until one 
reached the state that has the maximum payoff. A stopping rule or stopping 
time will be a random variable 7’ that gives the time at which the chain is 
stopped. It is important that one must decide whether or not to stop based 
only on what has happened up through step n; in other words, one cannot 
look into the future to decide whether or not to stop. Because we are dealing 
with a time-homogeneous Markov chain it does not take too much work to 
convince oneself that the only reasonable stopping rules that do not look into 
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the future are of the following form: the state space is divided into two sets 
S, and So; if the state of the chain is in S; one continues, if it is in Sp» it 
stops. The goal is to maximize the expected payoff over all stopping rules. 
We let v(x) be the value of a state x, i.e., the expected payoff assuming that 
the optimal stopping strategy is used. We can write 


v(x) = maxE [f(Xr) | Xo = 2], 


where the maximum is over all legal stopping rules. 
There are two main inequalities that vu satisfies. First, v is greater than or 
equal to the payoff available by stopping, 


u(x) = f(x). (4.1) 


Second, v is greater than or equal to the maximum expected payoff if one 
continues, 


Ue) 2 Pon) = S_ p(s, y) u(y). (4.2) 


yes 


In fact, v is equal to the maximum of these values: 
u(x) = max{ f(x), Pv(z)}. (4.3) 


If we let S; be the set of states where one continues and S»2 the set of states 
where one stops (assuming the optimal strategy), and we let 


T =min{j > 0: X; € So}, 
then 
v(x) = E [f(Xr) | Xo = 2). 


We will characterize the function v. We call a function wu superharmonic 
with respect to P if it satisfies (4.2), i.e., 


u(x) > Pu(z). 


Suppose u is superharmonic and T' is the time associated to a stopping rule 
as above. Consider the time T;,, = min{7T,n} We claim that 


u(x) > E [u( Xr, ) | Xo — z| ‘ 


To see this, note that it is trivially true for n = 0. Assume it is true for n — 1. 
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Then 


E[u(X7,,) | Xo = 2] 
=) P{Xr, =y| Xo =z} uly) 


yes 
= PEP Xr, =| Xr, = 2} P(X = 2 | Xo = 2} uy) 
yES zES 
= » > P{Xr, =y|Xr7,_, =z} P{X7,_, =2| Xo =z} uly) + 
zES2o yES 
SY) So P{Xr, =y | Xx,_, = 2} P{X7,_, = z| Xo = 2} uly). 
zES; yES 


If z € So, then P{Xrp, = z| X7,_, = z} = 1 and hence the first double sum 
in the last expression equals 


S- Pi Xe = 2) Xe= au). 


zESo 


Ifze S$), P{ Xr, =y|Xr,_, =z} = p(z,y) and hence 


So P{Xr, =y| Xr,_, =z} u(y) = Pu(z) < u(z). 
yes 


Hence, 


E[u(Xr,) | Xo=2] < S P{Xr,_, =2| Xo = 2} u(z) 
zES 
= Elu(X7,_,) | Xo = 2] < u(x). 

Since u is a bounded function, we can let n — oo and get 

u(x) > lim E[u(X7,) | Xo = z] = E[u(X7r) | Xo = gz]. 
Now suppose that u(x) > f(x) for all x. Then 

u(x) = E[u(Xr) | Xo = 2] > E[f(Xr) | Xo = 2] = v(2). 
Hence every superharmonic function that is larger than f is greater than or 
equal to the value function v. Also we note (see Exercise 4.7) that if {u;(x)} 
is any collection of superharmonic functions, then 


UD). = inf UAT) 


is also superharmonic. We have derived the following. 
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Fact. v is the smallest superharmonic function with respect to P that is 
greater than equal to f; equivalently, 


u(x) = inf u(x), 
where the infimum is over all superharmonic functions u with u(x) > f(z). 


The characterization leads to an algorithm for determining v. Start with 
the function u;(x) that equals f(x) if x is an absorbing state and otherwise 
equals the maximum value of f. This gives a superharmonic function that is 
greater than f. Let 


u2(x) = max{Pu;(z), f(x)}. 
Since u; is superharmonic and u; > f, ue(x) < ui(x). Also, 
Puo(r) < Pui(x) < ug(z). 
Hence, ug is a superharmonic function greater than f. Continuing, we define 
Un(x) = max{Pun-i(2x), f(x)}, 
and we see that u, is a superharmonic function greater than f but less than 
Un—1. We will show at the end of this section that 


v(x) = lim u,(z). 


Example 1. If we consider the game that we already analyzed and started 
with the function u = [5, 5, 5,5, 5,0], then in 10 iterations we would see ui9 = 
[3.002, 3.002, 3.002, 4, 5, 0). 


Example 2. Suppose X,, is a simple random walk (p = 1/2) with absorbing 
barriers on {0,1,2,3,4,5,6}. Let the payoff function f be given by f = 
(0, 2,4, 5, 9, 3,0] (we write the payoff function as a vector in a natural way). 
We will first determine the optimal strategy. Clearly one stops at state 4 and 
one has to stop at 0 and 6. From state 5 there is a probability 1/2 of going 
to 4 and 1/2 of going to 6; the expected payoff given that we continue is at 
least 9/2 > f(5) = 3, so from 5 we continue. If one starts in state 3, then 
one can get an expected payoff of (4+ 9)/2 = 13/2 by taking one step and 
then stopping. Since this is greater than f(3) = 5, it must be optimal to 
continue from state 3 and v(3) > 13/2. Note that from state 2 playing gives 
an expected payoff of at least [f(1) + v(3)]/2 > 17/4 > f(2) = 4. Hence, 
we continue on state 2 and v(2) > 17/4. Similarly, if one continues from 
state 1 we can obtain an expected payoff of v(2)/2 > 17/8 > f(1) = 2, so 
the optimal strategy is to continue. Therefore the stopping set in this case is 
Sy = {0,4,6}. The value function can be obtained by 


u(x) = E[f(Xr) | Xo = z] =9P{X7 =4| Xo = cz}. 
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The probability has been computed before [see (1.16)] and we get 


In the graph below the solid line represents f and the dotted line represents 
uv. For simple random walk, the superharmonic functions are the concave 
functions. The function v is the smallest concave function satisfying v > f. 


Oo re NO WO HBR AWB a CO LO 


In this example, if we had started with the function u; = (0,9, 9,9, 9,9, 0] 
and performed the algorithm above we would have gotten within .01 of the 
actual value of v in about 20 iterations. 


In solving the optimal stopping problem we simultaneously compute the 
value function v and the optimal stopping strategy. Suppose that we knew 
the strategy that we would choose, i.e., we split the state space into two sets 
S; and Sy so that we continue on S; and stop on S». Let u(x) be the expected 
payoff using this strategy. Then wu satisfies: 


u(x) = f(x), x € So, (4.4) 


u(x) = Pu(x), xe Sj. (4.5) 


This is a discrete analogue of a boundary value problem sometimes called the 
Dirichlet problem. The boundary is the set Sp. where prescribed values are 


Optimal Stopping 93 


given. On the “interior” points S;, some difference equation holds. As we 
have seen the probabilistic form of the solution of this system can be given by 


u(x) =E|f(Xr) | Xo = aI, 
where 
T =min{j >0:X; E So}. 


For a finite-state Markov chain, the solution can be found directly because 
(4.4) and (4.5) give k linear equations in k unknowns, where k is the number 
of points in S; and the unknowns are u(x), x € S$}. 

We now verify that the algorithm does converge to the value function v. 
Let u(z) = limn—o Un(z). Since u is the decreasing limit of superharmonic 
functions, u is superharmonic (see Exercise 4.7). Also u(z) > f(z) for all z. 
Hence by the characterization of v, we get 


u(z) > v(z). (4.6) 
Let the stopping set Sj be defined by 


Soe a2) 


Sr {2 zy) Se) 


On S;, Pu(z) = u(z) (if Pu(z) < u(z), then for some n, Pun(z) < u(z) < 
Un(z) and hence un+yi(z) = max{Pu,(z), f(z)} < u(z) which is impossible). 
Therefore, 


u(z) = Elu(Xr) | Xo = 2], 


where T is the strategy associated with the sets S,,52. Since v(z) is the 
largest expected value over all choices of stopping sets, 


u(z) < v(z). (4.7) 


Combining (4.6) and (4.7) we see that u(z) = v(z) for all z. 


4.2 Optimal Stopping with Cost 


Consider the first example of the previous section, and suppose that there 
is a charge of $1 for each additional roll, i.e., on each roll we can either take 
the payoff associated with that roll or pay $1 and roll again. In general, we 
can assume that there is a cost g(x) associated with each state that must be 
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paid to continue the chain. As before we assume we have a payoff function f 
and we let u(x) be the expected value of the payoff minus the cost assuming 
a stopping rule is chosen that maximizes this expected value. We can write 


v(x) = max E f(Xp) — Vg X) Xo = £4 


j=0 


where again the maximum is over all legal stopping times T. Then v(x) 
satisfies: 


v(x) = max{ f(x), Po(x) — g(x) f. 


Here, the expected payoff minus cost if the chain is continued is Pu(x) — g(z). 
Again we can divide S into S,; and S»y where 


So = {x: u(x) = f(x)}, 


and the optimal stopping rule is to stop when the chain enters a state in S9. 

Using a similar argument as in Section 4.1, the value function v for this 
example can be characterized as the smallest function u greater than f that 
satisfies 


u(x) > Pu(xr) — g(2). 
In other words, 
u(x) = inf u(x), 


where the infimum is over all u satisfying u(x) > f(x) and u(x) > Pu(x) — 
g(x). To find the value function, we may use an algorithm similar to that in 
Section 4.1. We define u; to be the function that equals f on all absorbing 
states and equals the maximum value of f everywhere else. We then define 


Un(x) = max{ f(x), Pun—1(x) — g(x)}, 
and then 


vz) = Jim Un(x). 


Example 1. Suppose we consider the die game with f = [1,2,3,4,5,0] and 
g = (1,1,1,1,1,1]. The cost function makes it less likely that we would want 
to roll again, so it is clear that we should stop if we get a 4 or a 5; similarly, 
we should stop if we get a 3 since we were indifferent before with this roll 
and it costs to roll again. If we get a 1, then by rolling again we can get an 
expected payoff of at least 5/2 with a cost of 1. Hence we can expect a net 
gain of at least 3/2. Therefore we should play if we get a 1. 
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Suppose we roll again whenever we get a 1 or 2 and stop otherwise. Let 
u(k) be the expected winnings with this strategy. Then u(1) = u(2) = u and 
u(k) =k,k = 3,4,5. Also, 


1 1 1 1 1 1 1 
(2) = gull) + g ul?) + 5 ul3) + 5 utd) + 5 ul) + 5 (6) fe gu +1. 
Solving for u gives u = 3/2. Since this is less than f(2) = 2, it must be 
correct to stop at 2. Hence the stopping set is So = {2,3,4,5,6} and the 
value function is 


v = (8/5, 2,3, 4,5, 0]. 


If we started with the initial u; = [5,5,5,5,5,0] and performed the algo- 
rithm described above, then after only a few iterations we would have 


ui = (1.6, 2, 3, 4,5, 0]. 


Example 2. Consider the other example of the previous section where X,, 
is a simple random walk with absorbing boundary on {0,1,...,6} and f = 
[(0, 2, 4,5, 9, 3,0]. Suppose we impose a cost of .5 to move from states 0, 1,2 
and a cost of 1 to move from 3,4, 5, 6, i.e., a cost function 


6 eal a ses peers el res 
If we start with initial u; = [0,9,9,9,9,9,0], then in only six iterations we get 
ue = |0, 2, 4, 5.5, 9, 3.5, O], 
which gives the value for v. In this case the stopping set is Sz = {0,1,2,4,6}. 


Example 3. With a cost function, it is possible to have a nontrivial problem 
even if the Markov chain is irreducible. Suppose we play the following game: 
roll two dice; the player may stop at any time and take the roll on the dice 
or the player may pay 2 units if the roll is less than 5 and 1 unit if the roll 
is greater than or equal to 5 and roll again. In this case the state space is 
ae ee Deere 6 


§ = 2 3 Ayesesl2h G3 2,2, 2, ky eas fh, 


If we start with the initial guess u; = [12,12,... , 12] then within 20 iterations 
we converge to the value function v, 


Hj 55: 6 62 76 o 10,11 0) 
= 3” 3” 3 ae 3’ 9 9 9 y) y) - 


The stopping set is So = {7,8,... ,12}. 
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4.3 Optimal Stopping with Discounting 


It is often appropriate in modelling financial matters to assume that the 
value of money decreases with time. Let us assume that a discount factor 
a <1 is given. By this we mean that 1 dollar received after one time unit is 
the the same as a dollars received in the present. Again suppose we have a 
Markov chain X, with transition matrix P and a payoff function f. It is now 
the goal to optimize the expected value of the payoff, taking into consideration 
the decreasing value of the payoff. If we stop after k steps, then the present 
value of the payoff in k steps is a* time the actual payoff. 

In this case the value function is given by 


ae) = max E la? f (Xr) | Xo = a], 


where again the maximum is over all legal stopping rules. To obtain this value 
function, we characterize v as the smallest function u satisfying 


u(x) = f(x), 


u(x) > aPu(zx). 


We may obtain v with a similar algorithm as before. Start with an initial 
function u; equal to f at all absorbing states and equal to the maximum 
value of f at all other states. Then define u, recursively by 


Un(xz) = max{f(x),aPu,_1(x)}. 
Then 


Of) = Jim: tin ( 2). 


Example 1. Consider the die game again. Assume a discounting factor of 
a = .8. Since discounting can only make it more likely to stop it is easy to 
see that one should stop if the first roll is a 3,4, or 5. If the first roll is a 1, 
one can get an expected payoff of at least .8[/(1 +2+3+4+ 5)/6] = 2 by 
rolling again, so it is best to roll again. Suppose we use the strategy to roll 
again with a 1,2 and to stop otherwise and let u be the expected winnings 
given that one rolls again. Then 


u= 8 ree ea eer 
~~ \6 6 6 6 6) 


Solving for wu we get u = 24/11 > 2 so it must be optimal to roll again with a 
2. Therefore Sp = {3,4,5,6} and 
| 24 24 


—, —, 3,4, 9 
11’ 11° > nae | 0) 
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Example 2. Consider the example of a simple random walk with absorbing 
boundaries on {0,1,...,6} and f = [0,2,4,5,9,3,0]. Suppose that there is 
no cost function, but the value of money is discounted at rate a = .9. If we 


start with u, = [0,9,9,9,9,9,0] then in seven iterations we converge to the 
value 


ur = (0, 2,4, 5.85, 9, 4.05, 0). 
This stopping set is {0, 1, 2,4, 6}. 


It is possible to include both a cost function and a discounting factor. Sup- 
pose in addition to the other assumptions in this section, we have a cost 
function g(x) that indicates the cost of taking a step given that the chain is 
in state x. Then the value function v is the smallest function u satisfying 


u(x) 2 f(z), 


u(x) > aPu(x) — g(x), 


Example 3. Consider the random walk with absorbing boundaries de- 
scribed before with f = [0,2,4,5,9,3,0] and with both the cost function 
g = [.5,.5,.5,1,1,1,1] and the discount factor a = .9. If we start with 
u, = [0,9,9,9,9, 9,0} then in only three iterations we converge to 


v = (0, 2,4, 5,9, 3.05, 0]. 
The stopping set is {0,1,2,3,4,6}. 
Example 4. Consider a random walk with absorbing boundaries on the 
state space {0,1,...,10}. Suppose the payoff function is the square of the 
site stopped at, Le., 


f = [0,1,4,5,9,... , 100]. 


We assume that there is a constant cost of .6 and a discounting factor of 
a = .95. We then start with the initial guess 


u; = [0, 100, 100, 100,... , 100] 
and after 60 iterations we get 
ugo = [0, 1.51, 4.45, 9.11, 16, 25, 36, 49, 64, 81, 100]. 


The stopping set is {0,4,5,6,... , 10}. 
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4.4 Exercises 


4.1 Consider a simple random walk (p = 1/2) with absorbing boundaries on 
{0,1,2,...,10}. Suppose the following payoff function is given 


[0, 2, 4, 3, 10, 0, 6, 4, 3, 3, 0}. 


Find the optimal stopping rule and give the expected payoff starting at each 
site. 


4.2 The following game is played: you roll two dice. If you roll a 7, the game 
is over and you win nothing. Otherwise, you may stop and receive an amount 
equal to the sum of the two dice. If you continue, you roll again. The game 
ends whenever you roll a 7 or whenever you say stop. If you say stop before 
rolling a 7 you receive an amount equal to the sum of the two dice on the last 
roll. What is your expected winnings: a) if you always stop after the first roll; 
b) if you play to optimize your expected winnings? 


4.3 Consider Exercise 4.1. Do the problem again assuming: 
(a) a constant cost of .75 for each move; 
(b) a discount factor a = .95; 
(c) both. 


4.4 Consider Exercise 4.2. Do the problem again assuming: 
(a) a cost function of g = |2,2,2,2,1,1,1,1,1,1, 1]; 
(b) a discount factor a = .8; 
(c) both. 


4.5 Consider a simple random walk on the following four-vertex graph. 


A B 


D C 
Assume that the payoff function is: f(A) = 2, f(B) = 4, f(C) = 5, f(D) = 


3. Assume that there is no cost associated with moving, but there is a discount 
factor a. What is the largest possible value of a so that the optimal stopping 
strategy is to stop at every vertex, i.e., so that Sp = {A, B,C, D}? 


4.6 Consider the following simple game. You roll a single die. If it comes up 
1 you lose. If it comes up k > 1, you can either take a payoff of k? or you can 
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play again. Hence, the final payoff is either 0 (if you roll a 1) or otherwise the 
square of the value of your final roll. 

(a) What is the optimal strategy in this game and what is the expected 
winnings if one uses the optimal strategy? 

(b) Suppose that it costs r to play the game each time. What is the smallest 
value of r such that the optimal strategy is to play if one rolls a 2 and to stop 
if one rolls any other number? 


4.7 If u(y), ue(y),... are all functions that are superharmonic at x for P, 
1.€., 


Pu; (a) < uj(2), 
and we let u be the function 

u(y) = inf ui(y), 
show that wu is superharmonic at x for P. 


4.8 Consider a simple “Wheel of Fortune” game. A wheel is divided into 
12 equal-sized wedges. Eleven of the edges are marked with the numbers 
100, 200,... , 1100 denoting an amount of money won if the wheel lands on 
those numbers. The twelfth wedge is marked “bankrupt.” A player can spin 
as many times as he or she wants. Each time the wheel lands on a numbered 
wedge, the player receives that much money which is added to his/her previous 
winnings. However, if the wheel ever lands on the “bankrupt” wedge, the 
player loses all of his/her money that has been won up to that point. The 
player may quit at any time, and take all the money he or she has won 
(assuming the “bankrupt” wedge has not come up). 

Assuming that the goal is to maximize one’s expected winnings in this game, 
devise an optimal strategy for playing this game and compute one’s expected 
winnings. You may wish to try a computer simulation first. 


4.9 Suppose X,, is random walk with absorbing boundary on {0,1,2,...} 
with 


p(njn+1)=p(n,n-1l)=~, n2>1. 


Suppose our payoff function is f(n) = n?. Let us try to find a stopping time 
T that will maximize E[f(X7)}. 
(a) Show that if X, > 0, then 


El f(Xn4i) | Xn] > f(Xn). 


Conclude that any optimal strategy does not stop at any integer greater than 
0. 


100 Introduction to Stochastic Processes 


(b) Since the random walk is recurrent, we know that we will eventually 
reach 0 at which point we stop and receive a payoff of 0. Since our “optimal” 
strategy tells us never to stop before then, our eventual payoff in the optimal 
strategy is 0. Clearly something is wrong here—any ideas? 


4.10 We have been restricting ourselves to stopping rules T' that do not look 
into the future. Suppose we can look into the future so that we always know 
when we reach the site that will have the highest payoff. Explain why the 
expected payoff is 


Ustop te) = 1s [max f(Xn) | Xo = x ; 


The subscript prop stands for “prophet.” Clearly vprop(Z) > v(2). 

(a) Find prop for the die game discussed at the beginning of the chapter 
(where the game stops whenever a 6 is rolled). 

(b) Find vprop for the chain and payoff function in Exercise 4.1. 


Chapter 5 


Martingales 


5.1 Conditional Expectation 


To understand martingales, which are a model for “fair games,” we first need 
to understand conditional expectation. We start with some easy examples and 
build up to a general definition. Suppose Y is a random variable measuring 
the outcome of some random experiment. If one knows nothing about the 
outcome of the experiment, then the best guess for the value of Y is E(Y), 
the expectation. Of course, if one has complete knowledge of the outcome of 
the experiment, then one knows the exact value of Y. Conditional expectation 
deals with making the best guess for Y given some but not all information 
about the outcome. We will start by discussing the conditional expectation 
of a random variable Y with respect to a finite number of random variables 
X1,...,Xn and then finish by discussing the conditional expectation with 
respect to an infinite number of random variables. 

Suppose that X and Y are discrete random variables with joint probability 
density function 


f(x,y) =P{X =2,Y = y} 


and marginal probability density functions 


fle) =) fem). vay= > sy). 


To define the conditional expectation of Y given X, E(Y | X) we need to 
give the best value of Y for any value of x. A little thought will show that we 
should define 


E(Y | X)(z) =) yP{Y =y|X =2} 


7 Pix Say = gy} 
= LY Pray 


— Ly y f(z, y) 
— fx(z) 


101 


102 Introduction to Stochastic Processes 


This is well defined if fx (x) > 0, and we do not bother to define E(Y | X)(x) 
for other values of x since such values occur with probability 0. As an example 
suppose that two independent dice are rolled and X denotes the value of the 
first roll and Y denotes the sum of the two rolls. Then 


] 


fACre a a 1,2,...6,y=xrxt1,44+2,...44+6, 
and 
fi 
Similarly, if X1,...,Xn,Y are discrete random variables with joint proba- 


bility density function 
F Disies SR Xe Sips ok SH aa 


and the marginal density with respect to X1,...,Xn is given by 


Digan en ae Dest 


then the conditional expectation of Y given X,,...,Xn, 1s given by 


2 UF (Ligssatusy) 
E(Y | X1,...,Xn <<. 0)) = 
( | 1; ’ \(21, , tv ) g(@1,... Ln) 


This is well defined if x1,...,2, is a possible outcome for the experiment, 
le., if g(%1,...,2%n) > 0. Again, we think of E(Y | X1,...X,) as being the 
best guess for the value of Y given the values of Xj,..., Xn. 

If X and Y are continuous random variables with joint density f(x,y) and 


marginal densities 
=f feud, frw)= | fewa 


then the conditional expectation of Y given X is defined in an analogous way 
Sox ¥ f(x,y) dy 
fx(@) 


which is well defined for fx(x) > 0. Similarly if X1,...X,,Y have joint 
density f(21,...,2n,¥y), 


E(Y | X)(a) = 


eee icone :Tn,y) dy 
BO (Sa 
FX4,...,Xn(L1s+++ sn) 


The conditional expectation E(Y | X1,...,Xn) is characterized by two 
properties: 
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1. The value of the random variable E(Y | X,... , Xn) depends only on the 
values of X1,...,Xn, ie., we can write E(Y | Xq,...,Xn) = o(X1,...,Xn) 
for some function ¢. If a random variable Z can be written as a function of 
Xj ,...,Xn it is called measurable with respect to X1,...,Xn. (For those 
who know measure theory, the function must be Borel measurable.) 


2. Suppose A is any event that depends only on Xj,...,X,. For example, 
A might be the event 


AS 4G SOS Dijon 5 On = ay 0, }. 


Let [4 denote the indicator function of A, i.e., the random variable which 
equals 1 if A occurs and 0 otherwise. Then 


E(YI4) =E[E(Y | %,...,Xn) La). (5.1) 


Let us derive the last equality in the case where Xj,...,X,,Y are continuous 
random variables with density f(21,...,2%n,y) and A is the above event; the 
derivation for discrete random variables is essentially the same. 


E[E(Y | Xy,...,Xn)Ia] 


( 
by 
=f. Af [- BY 15G S4is See) 


Ff (Zty8 hn sy) dy day - dz 
b1 BM Gist Da 2) Oe 
=| - [ [. Fg SP Digaedataye) az 
f(a1,---,%n,y) dy drn--- dx, 


by 
=| - Af [et fi Vissen hai 2) de days dz; 


=E(YIa). 


Conditions 1 and 2 give a complete characterization of the conditional ex- 
pectation. 


Fact. E(Y | X1,...Xn) ts the unique random variable which depends only 
on X1,...,Xn and which satisfies (5.1) for every event A that depends only 
ONAN ene. SARs 


In measure theoretic treatments of probability, the conditional expectation 
is defined as the random variable satisfying conditions 1 and 2 and then it 
is proved that this uniquely defines a random variable (up to an event of 
probability zero). For our purposes, the characterization will be useful in 
deriving some properties of conditional expectation. 

We will make the notation a little more compact. If X1,X9,... is a sequence 
of random variables we will use F,, to denote the “information contained in 
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X1,...,Xn.” We will write E(Y | F,) for E(Y | X1,...,Xn). If we apply 
(5.1) to the event A consisting of the entire sample space (so that [4 = 1) we 
get 


E[E(Y | F,)] =E(¥). (5.2) 
Conditional expectation is a linear operation: if a,b are constants, then 
E(aY, + bY2 | Fn) =a E(Y, | Fn) + bE(Y2 | Fn). (5.3) 


To prove this, we need only note that the right-hand side is measurable with 
respect to X1,...,Xn and satisfies (5.1). The next two properties can be 
derived similarly. If Y is already a function of X1,...,X, then 


PYF) Ye (5.4) 
For any Y, if m <n, then 
BEY |) Fa )| Foe) SPY | Fae (5.5) 


If Y is independent of X,,... ,Xn, then information about X,,...,X, should 
not be useful in determining Y and 


E(Y | Fn) =E(Y). (5.6) 


This can be derived easily from (5.1) since in this case Y and J, are indepen- 
dent random variables. The last property we will need is a little trickier: if Y 
is any random variable and Z is a random variable that is measurable with 
respect to X1,...Xn, then 


E(YZ | Fn) = ZE(Y | Fa). (5.7) 


It is clear that the right-hand side is measurable with respect to X1,..., Xn, 
so it suffices to show that it satisfies (5.1). We will not prove it here; the basic 
idea is to approximate Z by simple functions, for which (5.1) can be derived 
easily, and pass to the limit. 


Example 1. Suppose Xj , X2,... are independent, identically distributed 
random variables with mean p. Let S, denote the partial sum 


Sy = Xp +--+ + Xp. 


Let F,, denote the information in X,,...,Xn. Suppose m < n. Then by 
(5.3), 


E(Sy | Fm) = E(X,+-->-+ Xm | Fm) + E(Xma1 +--+ + Xn | Fm). 
Since X, +--:+ Xm is measurable with respect to X1,...,Xm, by (5.4), 


E(X, +---+ Xm | Fm) = Xi +--+ + Xm = Sm. 
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Since Xm+41+-::+ Xp is independent of X1,...,Xm, by (5.6), 
E(Xm4i +++: + Xn | Fm) = E(Xmgi + +++ + Xn) = (n— m) pe. 
Therefore, 


E(S, | Fm) = Sm + (n-— mM) pe. (5.8) 


Example 2. Suppose X,, Xo,... and S, are as in Example 1. Suppose p = 0 
and Var(X;) = E(X?) = 07. Let m <n. Then, by (5.3), 
E(S | Fm) = E([Sin gl (od ae Sia) | Fm) 
= E($%, | Fm) + 2E(Sm(Sn — Sm) | Fm) 
+ E((Sy — Sm)* | Fm). 


Since S,,, depends only on X),...,Xm and S,—S,, is independent of Xj,... 
Xm, we have as in the previous example 


E(S?, | Fim) = So 


E((Sn — Sin)” | Fm) = E((Sn — Sm)*) = Var(Sn — Sin) = (n — m)o?. 
Finally, by (5.7), 

ES (Shop) | Fra) = Se on Oe | Fa) HS BS HS a) = 0. 
Therefore, 


E(S2 | Fm) = 82, + (n—m) 0”. 


Example 3. Consider a special case of Example 1 where the random variable 
X; has a Bernoulli distribution, P{X; = 1} = p, P{X; = 0} =1-p. Again 
assume that m < n. For any i < m, consider E(X; | S,). If S, = k then 
there are k 1s in the first n trials. Given S, = k it is an easy exercise in 
conditional probability to show that P{X; =1|S, =k}=k/n. Hence, 


S 


ns 
n 


E(X; | Sn) = 


) 


and 


DSI S) SF 46 ee ESS — Sn 


We will need to consider conditional expectations with respect to an infinite 
collection of random variables, 


E(Y | Xo,a€ A). 
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Let F denote the information in {Xq}. A random variable Z is F-measurable 
if knowledge of all the {X,} determines Z. Essentially, Z is F-measurable 


if Z = $(Xq,,---,Xa,,) for some function ¢ and some finite subcollection 
Xa,>-++)4a,, or if Z is a limit of such random variables. As an example, sup- 
pose Y, X,, Xo,... are independent random variables with X,, X9,... normal 


mean zero, variance one and Y having some unknown nontrivial distribution. 
Let 


Z4,;=X;+ Y. 
Let F,, denote the information in Z),... ,Z,, and let F,, denote the informa- 
tion in 2, Z2,.... One cannot determine the value of Y given 2 1,... , Zn, SO 


Y is not F,-measurable. However, Y is #,.-measurable since the law of large 
numbers implies 


Fo eo cnt Fe 
Y = lim Goal ie Ba Rg 


NM— CO nN 


If F denotes the information contained in {X.}, we will say that an event A 
is #-measurable if one can determine whether or not the event has occurred if 
one knows the values of {Xq}. This is equivalent to saying that the indicator 
function J, is an F-measurable random variable. The conditional expectation 
E(Y | F) is defined to be the unique #-measurable random variable Z such 
that for every F-measurable event A, 


E(YI4) =E(ZIa,). 


All of the properties (5.2) through (5.7) hold for this more general conditional 
expectation. 


5.2 Definition and Examples 


A martingale is a model of a fair game. We will let {F,,} denote an increas- 
ing collection of information. By this we mean for each n, we have a collection 
of random variables A, such that A,, C A, ifm <n. The information that 
we have at time n is the value of all of the variables in A,. The assumption 
Am C A, means that we do not lose information. A random variable X is 
Fn-measurable if we can determine the value of X if we know the value of 
all the random variables in A,. The increasing sequence of information F,, is 
often called a filtration. 

We say that a sequence of random variables Mo, M,, Mo,... with E(|M;|) < 
oo is a martingale with respect to {F,,} if each M,, is measurable with respect 
to F,,, and for each m < n, 


E(Mn | Fan) = Mm, (5.9) 
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or equivalently, 
E( My — Mm | Fm) = 0. 


The condition E (|M;|) < co is needed to guarantee that the conditional ex- 
pectations are well defined. If F, is the information in random variables 
X1,...,Xn, then we will also say that Mop, M,,... is a martingale with re- 
spect to Xo, X1,.... Sometimes we will just say Mo, M,,... is a martingale 
without making reference to the filtration F,. In this case it will mean that 
the sequence M,, is a martingale with respect to itself (in which case the first 
condition is trivially true). In order to verify (5.9) it suffices to prove that for 
all n, 


E(Mn+1 | Fn) = Mn, 
since if this holds, by (5.5), 
E(Mn+2 | Fn) = E(E(Mn+2 | Fn+1) | Fn) 
= E(Mn41 | Fn) = Mn, 
and so on. 


Example 1. Let X,, Xo,... be independent random variables each with 
mean p. Let So = 0 and for n > 0 let S, be the partial sum 


Sn =X, +--+ + Xn. 


Then M, = Sy, — np is a martingale with respect to F,, the information 
contained in Xo,...,X,. This can easily be checked by using Example 1 of 
Section 5.1, 


E(Mn+1 | Fn) = E(Sn4i — (n+ 1) | Fn) = E(Sn41 | Fn) — (n+ 1)u 


= (Sn + bw) — (n+ 1) = Mn. 
In particular, if u = 0, then S, is a martingale with respect to Fy. 


Example 2. The following is a version of the “martingale betting strategy” 
which is a way to beat a fair game. Suppose X,,Xo,... are independent 
random variables with 

1 


PIX: = 1 =PiAL=—-lb = 5 


We can think of the random variables X; as the results of a game where one 
flips a coin and wins $1 if it comes up heads and loses $1 if it comes up tails. 
One way to beat the game is to keep doubling our bet until we eventually win. 
At this point we stop. Let W,, denote the winnings (or losses) up through 
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n flips of the coin using this strategy. Wo = 0. Whenever we win we stop 
playing, so our winnings stop changing and 


Now suppose the first n flips of the coin have turned up tails. After each flip 
we have doubled our bet, so we have lost 1+2+4+---+2"-! = 2”—1 dollars 
and W, = —(2” —1). At this time we double our bet again and wager 2” on 
the next flip. This gives 


P{Wnoi =1|W, = —-(2"°-D}=e, 


P{Wryi = —(2"*! — 1) | Wy = -(2" -1)} = 5. 
It is then easy to verify that 
E(Wrsi | Fn) = Wa, 
and hence W,, is a martingale with respect to Fy. 


Example 3. We can generalize the previous example. Suppose Xj, X2,... 
are as in Example 2. Suppose that on the nth flip we make a bet equal to 
B,,. In determining the amount of the bet, we may look at the results of the 
first (n — 1)st flips but cannot look beyond that. In other words, B, is a 
random variable measurable with respect to F,_; (we assume that By, is just 
a constant). The winnings after n flips, W,,, are given by Wo = 0 and 


W, — 3 B;X;. 
j=l 


We allow B, to be negative; this corresponds to betting that the coin will 
come up tails. Assume that E(|B,|) < co (which will be guaranteed if the 
bet at time n is required to be less than some constant C,,). Then W,, is a 
martingale with respect to F,,. To see this, we first note that E(|W,|) < c 
follows from the fact that E(|B,|) < co for each n. It is clear that W,, is 
Fy-measurable. Finally, 


n+1 
E(Wrti | Fn) = E() | B;X; | Fn) 
j=l 


= E()" BjXj | Fn) + E(BnsiXn4i | Fn). 


j=l 


By (5.4), 


E(>_ BX; | Fn) = > BX; = Wa. 
g=1 j=l 
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Since B,41 is F,-measurable, it follows from (5.7) and (5.6) that 
E(Bn4iXnt+1 | Fe) a BnaiE(Xn41 | Fy) aa Bniik (Xn41) = 0. 
Therefore, 


Example 4. Polya’s Urn. Consider an urn with balls of two colors, red 
and green. Assume that initially there is one ball of each color in the urn. 
At each time step, a ball is chosen at random from the urn. If a red ball is 
chosen, it is returned and in addition another red ball is added to the urn. 
Similarly, if a green ball is chosen, it is returned together with another green 
ball. Let X, denote the number of red balls in the urn after n draws. Then 
Xo = 1 and X,, is a (time-inhomogeneous) Markov chain with transitions 


k 
Pi xv HEL A, SH kt SS 
n+2—k 
PiX,4, =k\| X, =k} = ————— 
{ a | } n+2 


Let M, = X,/(n + 2) be the fraction of red balls after n draws. Then M,, is 
a martingale. To see this, note that 

Xn 
n+2 


E(Xn41 | Xn) = Xn + 
Since this is a Markov chain, all the relevant information in F,, for determining 
Xn+1 is contained in X,. Therefore, 


E(Mn+1 | Fn) = E((n + 3)° a | Xn) 


1 Xn, 
= — |X, + 
=| = 


T° 


A process M,, with E(|M,|) < oo is called a submartingale (supermartin- 
gale) with respect to {F,} if for each m < n, E(My | Fm) > (<) Mm. In 
other words, a submartingale is a game in one’s favor and a supermartingale 
is an unfair game. Note that M,, is a martingale if and only if it is both a 
submartingale and a supermartingale. 


Example 5. Let X, be a Markov chain with finite state space. Suppose 
a payoff function f is given as in Chapter 4. Let v be the value function 
associated to the payoff functions, v(x) = E(f(Xr) | Xo = x), where T is the 
stopping rule associated with the optimal strategy. Then M, = v(Xy) is a 
supermartingale with respect to Xo, Xj,.... 
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5.3 Optional Sampling Theorem 


The optional sampling theorem states in effect, “You cannot beat a fair 
game.” However, it is easy to see that this statement is false in complete 
generality. For example, suppose one plays the fair game of flipping a coin, 
winning one’s bet if the coin comes up heads and losing one’s bet if it is 
tails. Then using the “martingale betting strategy” described in Example 2 
of Section 5.2, one can guarantee that one finishes the game ahead. 

A stopping time T with respect to a filtration {F,,} is a random variable 
taking values in the nonnegative integers (we allow T = oo as a possible value) 
that gives the time at which some procedure is stopped (T = oo corresponds 
to never stopping), such that the decision whether to stop at time n must 
be made using only the information available at time n. More precisely, we 
say that T is a stopping time (with respect to {F,,}) if for each n, the event 
{T = n} is measurable with respect to Fy. 


Example 1. Let k be an integer and let T = k. 
Example 2. Let A be a set and let T4 = min{j : X,; € A}. 


Example 3. If JT and U are stopping times, then so are min{T,U} and 
max{T,U}. In particular, if T is a stopping time and T, = min{T,n}, then 
each T, is a stopping time, Tp < 7, < 7) <---, and T, <n. 


The optional sampling (or optional stopping) theorem states that (under 
certain conditions) if M, is a martingale and T is a stopping time then 
E(Mr) = E(Mo). This will not hold under all conditions since if we con- 
sider the martingale betting strategy and let T be the first time that the coin 
comes up heads, then 1 = E(Mr) #4 E(Mo) = 0. The first thing we would 
like to show is that there is no way to beat a fair game if one has only a finite 
amount of time. 


Fact. Suppose Mo, Mj,... is a martingale with respect to {F,,} and suppose 
T' is a stopping time. Suppose that T 1s bounded, T < K. Then 

E(Mr | Fo) = Mo. 
In particular, E(Mr) = E(Mo). 


To prove this fact, we first note that the event {7 > n} is measurable 
with respect to F,, (since we need only the information up through time n to 
determine if we have stopped by time n). Since Mr is the random variable 
which equals M; if T’ = 7 we can write 


Mr = 9 Mj I{T = j}. 


j=90 
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Let us take the conditional expectation with respect to Fx _1, 


E(Mr | Fx-1) = E(MxI{T = K} | Fr-1) 


+ > E(MI{T = j} | F-1). 


j=0 
For j < K —1, M; I{T =j} is Fx _1-measurable; hence 
E(M,I{T = j} | Fx-1) = MjI{T = j}. 


Since T is known to be no more than K, the event {T = K} is the same as 
the event {T > K —1}. The latter event is measurable with respect to Fx_1. 
Hence, using (5.7), 


E(MxI{T = k} | FR=1) = E(MKI{T > Kk — 1} | Fx-1) 
= HT > K -1E(Mx | Fr-1) 
TT Sh AN ei 


The last equality follows from the fact the M,, is a martingale. Therefore, 


K-1 

E(Mr | Fx-1) =1{T > K -1}Mx-1+ ¥© MjI{T = 3} 
j=0 
K-2 

= I{T > K -2}Mx_1+ > MjI{T = j}. 


Ja) 


If we work through this argument again, this time conditioning with respect 
to Fx_—2, we get 


E(Mr | Fx-2) = E(E(Mr | Fx-1) | Fx-2) 
K-3 
= I{T > K -3}Mx-2+ >| MjI{T = j}. 


j=0 
We can continue this process until we get 


E(Mr | Fo) = Mo. 


There are many examples of interest where the stopping time T is not 
bounded. Suppose T is a stopping time with P{T < co} = l, ie., a rule 
that guarantees that one stops eventually. (Note that the time associated 
to the martingale betting strategy satisfies this condition.) When can we 
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conclude that E(Mr) = E(Mo)? To investigate this consider the stopping 
times T,, = min{T,n}. Note that 


Mr = Mr, +Mrl{T>n}-M, {TT > n}. 
Hence, 
E(Mr) = E(Mr,)+E(My7 L{T > n}) —-E(M, I{T > n}). 


Since T,, is a bounded stopping time, it follows from the above that E(Mr,) = 
E(Mo). We would like to be able to say that the other two terms do not 
contribute as n — oo. The second term is not much of a problem. Since 
the probability of the event {T > n} goes to 0 as n — ov, we are taking the 
expectation of the random variable My restricted to a smaller and smaller 
set. One can show (see Section 5.4) that if E(|M7|) < oo then E(|Mr|/{T > 
n}) > 0. 

The third term is more troublesome. Consider Example 2 of Section 5.2 
again. In this example, the event {T > n} is the event that the first n flips 
are tails and has probability 2~”. If this event occurs, the bettor has lost a 
total of 2” — 1 dollars, i.e., M, = 1-2”. Hence 


E(M, [{T > n}) =27-"°(1 — 2”), 


which does not go to 0 as n — oo. This is why the desired result fails in this 
case. However, if M,, and T are given satisfying 


lim E(|M,|I{T > n}) =0, 


then we will be able to conclude that E( Mr) = E(Mo). We summarize this 
as follows. 


Optional Sampling Theorem. Suppose Mo,M,,... 1s a martingale with 
respect to {F,} and T is a stopping time satisfying P{T < co} = 1, 


E(|Mr|) < cx, (5.10) 
and 
lim E(|Mn|I{T > n}) =0. (5.11) 
Then, E(My) = E(Mo). 


Example 1. Let X, be a simple random walk (p = 1/2) on {0,...,N} 
with absorbing boundaries. Suppose Xo = a. Then, X,, is a martingale. Let 
T = min{j: X; =O0or N}. T is a stopping time, and since X,, is bounded, 
(5.10) and (5.11) are satisfied [note that (5.10) and (5.11) are always satisfied 
if the martingale is bounded and P{T < co} = 1]. Therefore 


E (Mr) = E(Mp) =a. 
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But in this case E(Mr) = NP{X7 = N}. Therefore, 
a 


This gives another derivation of the gambler’s ruin result for simple random 
walk. 


Example 2. Let X, be as in Example 1 and let M, = DG —n. Then M, isa 
martingale with respect to X,,. To see this, note that by Example 2, Section 
5.1, 


E(Mns1 | Fn) = E(X24, —(n +1) | Fn} = XZ 4+-1-(n +1) = My. 


Again, let T = min{j : X; = 0 or N}. In this case, M, is not a bounded 
martingale so it is not immediate that (5.10) and (5.11) hold. However, one 
can prove (Exercise 1.7) that there exist C' < oo and p < 1 such that 


Pr Sn} = Cp 
Since |M,| < N? +n, one can then show that E(|Mr|) < oo and 
E(|M,|1{T > n}) < Cp” (N*4+n) 0. 


Hence the conditions of the optional sampling theorem hold and we can con- 
clude 


E (Mr) = E (Mo) = a’. 
Note that 
E (Mr) = E(X?) —-E(T) = N*P{X7 = N} -E(T) =aN —-E(T). 
Hence, 


E(T) =aN —a? =a(N —a). 


Example 3. Let X, be a simple random walk (p = 1/2) on the integers 
{...,—1,0,1,...} with Xo = 0. We have seen that this is a martingale. Let 
T = min{j : X; = 1}. Since simple random walk is recurrent, P{T < oo} = 1. 
Note that X7 = 1 and hence 


1 =E(Xr) £E(Xo) =0. 


Therefore, the conditions of the optional sampling theorem must not hold. We 
will not give the details here but it can be shown in this case that P{T > n} ~ 
cn~1/* for some constant c. By the central limit theorem, the random walk 
tends to go a distance of order ,/n in n steps. In this case E(|X,,| /{T > n}) 
does not go to 0. 
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Example 4. Example | can be extended to general Markov chains. Let P be 
the transition matrix for the irreducible Markov chain X,, on the finite state 
space S. Let A be a subset of S and let F be a function from A to R. Then 
we claim that there is a unique function f on S' satisfying 


fliz)=F(z), rea, 


Pf(z):= > p(z,y) f(y) =0, 2rES\A. 
yEA 
This is not surprising if one realizes that the last equation gives k equations 
in k unknowns where k is the number of elements in S\ A. Suppose f satisfies 
these conditions. Let T = min{n > 0: X, € A} and T, = min{n,T}. Let 
M, = f(Xr,). Then M,, is a bounded martingale, and hence, 


5.4 Uniform Integrability 


Condition (5.11) is often hard to verify. For this reason we would like to give 
conditions that may be easier to check from which we can conclude (5.11). 
We start by considering one random variable X with E(|X|) < oo. Let F 
denote the distribution function for |X|. Then it follows that 


lim E(|X|/{|X| > &})= im | lx| dF(x) = 0. 
K-00 K-00 JK 


Now suppose we have a sequence of random variables X,, X2,.... We say 
that the sequence is uniformly integrable if for every « > O there exists a K 
such that for each n, 


E[|Xn| {|X| > Kp] <e. 


It is important that K does not depend on n. If X,, Xo,... are uniformly 
integrable, then the following holds: for every « > 0, there is a 6 > 0 such 
that if if P(A) < 6, then 


E(|Xpn|Ia) <e (5.12) 


for each n. Again, 6 may not depend on n and (5.12) must hold for all values 
of n. To see that uniform integrability implies this, let « > 0 and choose 
K sufficiently large so that E[|X,| [{|Xn| > K}] < €/2 for all n. If we let 
6 = €/(2K), then if P(A) <6, 


E(|Xn|La) < E((\Xn|La3|Xn| < K) + E((Xn|;|Xn| > K) 
< KP(A) + (€/2) <e. 
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To develop a feeling for the definition, we will start by giving an example 
of random variables that are not uniformly integrable. Consider Example 
2 of Section 5.2, the martingale betting strategy, and consider the random 
variables Wo,W,W2,.... If we let A, be the event {X, = Xo =---=Xy,= 
—1} then P(A,) = 27” and E(|W,|J4,) = 277(2" —1) — 1. We clearly 
cannot satisfy the conditions for uniform integrability for any e < 1. 

Now suppose that Mo, M,,... is a uniformly integrable martingale with 
respect to Xo, X1,... and T is a stopping time with P{T < co} = 1. Then 


lim P{T > n}=0, 
and hence by uniformly integrability 
lim E(|M,|J{T > n}) = 0; 


that is, (5.11) holds. We can therefore give another statement of the optional 
sampling theorem. 


Optional Sampling Theorem. Suppose Mo,M,,... is a uniformly inte- 
grable martingale with respect to {F,} and T is a stopping time satisfying 
P{T < wo}=1 and E(|Mr|) < co. Then E(Mr) = E(Mo). 


The condition of uniform integrability can be difficult to verify. There are 
a number of easier conditions that imply uniform integrability. We mention 
one here and give another in the exercises (Exercise 5.15). 


Fact. If X,,Xo,... 1s a sequence of random variables and there exists a 
C < oo such that E(X?) < C for each n, then the sequence is uniformly 
integrable. 


To prove the fact, let € > 0 be given and let 6 = e?/4C. Suppose P(A) < 6. 
Then 
E (|Xn| la) = E[|Xn|I(AN {|Xn| 2 2C/e})| 
+E [|X,|L(AN {|X,| < 2C/e})] 
< (€/2C) E[|Xn|* 1(AN {|Xn| = 2C/e})] 
+(2C/e) PLAN {|Xn| < 2C/e}) 
< (€/2C) E(|X,|?) + (2C/e) P(A) <e. 


Example 1. Random Harmonic Series. It is well known that the har- 
monic series 1 + 5 + ; +--+ diverges while the alternating harmonic series 
1 - 5 a , = 5 +--+ converges. What if the pluses and minuses are chosen 
at random? To study this, let X,, X2,... be independent random variables 
with P{X; = 1} = P{X; = —1} = 1/2. Let Mp = 0 and for n> 0, 


N= IG. 


j=1 
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By Example 1, Section 5.2, M, is a martingale. Since E(M,,) = 0, 


Tr 


E (M2) = Var(M?) = S_ Var (545) = =e S- 
g=1 


Hence M,, is a uniformly integrable martingale. The question of convergence 
is discussed in the next section. 


Example 2. Branching Process. Let X,, denote the number of offspring 
in the nth generation of a branching process (see Section 2.4) whose offspring 
distribution has mean py and variance o*. Then (Exercise 5.5) M, = up" Xn 
is a martingale with respect to X,, X2,.... Suppose pp > 1. Then (Exercise 
5.11) there exists a constant such that for all n, E(M2) < oo and hence M,, 
is a uniformly integrable martingale for ps > 1. 


5.5 Martingale Convergence Theorem 


The martingale convergence theorem states that under very general condi- 
tions a martingale M,, converges to a limiting random variable M,,. We start 
by considering a particular example, Polya’s urn (Example 4, Section 5.2). In 
this case M,, is the proportion of red balls in the urn after n draws. What 
happens as n gets large? In Exercise 5.12 it is shown that the distribution 
of M,, is approximately a uniform distribution on [0,1] for large values of n. 
This leads to a question: Does the proportion of red balls fluctuate between 0 
and 1 infinitely often or does it eventually settle down to a particular value? 
We will show now that the latter is true. 

Let 0<a<6< cw and suppose that M, <a. Let T' be the stopping time 


T =min{j:j >n and M; > b}, 


and let T,, = min{7,m}. Then for m > n, the optional sampling theorem 
states that 


E (Mr) = Mn <a. 
But 
E(Mr,,) 2 E(Mr,, 1{T < m}) =E(Mr I{T < m}) 2 bP{T < m}. 
Hence, 


PIT <m}< - 
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Since this is true for all m, 


Pl Go 


m1 & 


This says that with probability of at least 1— (a/b) the proportion of red balls 
never gets as high as b. Now suppose the proportion of red balls does get 
higher than 6b. What then is the probability that the proportion goes down 
below a again? By the same argument applied to the proportion of green balls 
we can say that the probability of dropping below a is at most (1 —b)/(1—a). 
By continuing this argument, we can see that, starting at a, the probability 
of going above 6b, then below a again, then above b, then below a, a total of n 
times, can be bounded above by 


& 1—b 1—6 (2) 1—b - (2)" 1b)" 
b/ \1-a/ \b/ \1-a b/ \1-a/] \b l-a/) ’ 
which tends to 0 as n — oo. Hence, the proportion does not fluctuate infinitely 
often between a and b. Since a and 6 are arbitrary, this shows that it is 


impossible for the proportion to fluctuate infinitely often between any two 
numbers, or, in other words, the limit 


Ms = lim M,, 
™M— CO 
exists. The limit M. is a random variable; it is not difficult to show (see 
Exercise 5.12) that M,. has a uniform distribution on (0, 1}. 
We now state a general result. 


Martingale Convergence Theorem. Suppose Mo, Mj,... is a martingale 
with respect to {F,,} such that there exists aC < 00 with E(|M,,|) < C for 
alln. Then there exists a random variable M,, such that 


M, — Mga. 


Note that the limiting random variable M, is measurable with respect 
to Mop,M,,.... The proof of the theorem is similar to the argument above. 
What we will show is that for every 0 < a < b < oo the probability that the 
martingale fluctuates infinitely often between a and 6 is 0. Since this will be 
true for every a < 6, it must be the case that the martingale M,, converges to 
some value M.,.. 

Fix a < b. We will consider the following betting strategy which is rem- 
iniscent of the martingale betting strategy. We think of M, as giving the 
cumulative results of some fair game and M,,, — M,, as being the result of 
the game at time n+ 1. Whenever M, < a, bet 1 on the martingale. Con- 
tinue this procedure until the martingale gets above 6. Then stop betting 
until the martingale drops below a again and return to betting 1. Continue 
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this process, changing the bet to 0 when M,, goes above b and changing back 
to 1 when M,, drops below a. Note that if the martingale fluctuates infinitely 
often between a and 6 this gives a strategy that produces a long-term gain 
from the fair game. 

After n steps the winnings in this strategy are given by 


W, = >_ B; (Mj — Mj-1), 
j=l 


where B; is the bet which equals 1 or 0 depending on whether the martingale 
was most recently below a or above 6. One can verify as in Example 3, Section 
5.2 that W,, is a martingale with respect to Mp, M;,.... We note that 


W,, > (b-—a)U, -— |My —- al, 


where U,, denotes the number of times that the martingale goes between a 
and b (this is often called the number of upcrossings) and |M,, — a| gives an 
estimate for the amount lost in the last interval (this is relevant if the bettor 
is betting 1 at time n). Since W,, is a martingale we have 


E (Wo) = E(W,,) 2 (b—a)E(Un) — E(|M, — al). 
Since E (|M,, — a|) < E(|M,|) +a <C +a, we get 


E(Wo)+C+a 


< 
E(Un) S b-—a 


Since this holds for every n, the expected number of upcrossings up to infinity 
is bounded and hence with probability one the number of upcrossings is finite. 
This proves the theorem. 

The martingale property implies that for every n, E(M,,) = E(Mo). It is 
not necessarily true, however, that E(M,.) = E(Mo). For a counterexample, 
we return to the martingale betting strategy. In this case 


Wa lim W,:= 1, 


™— CO 


and hence E(W,,) 4 E(Wo) = 0. If the martingale is uniformly integrable, 
it is true that the limiting random variable has the same expectation (see 
Exercise 5.13). 


Fact. If M,, is a uniformly integrable martingale with respect to Xo, X1, .-., 
then 


exists and E(M..) = E(Mb). 
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Example 1. Let X, be the number of individuals in the nth generation 
of a branching process whose offspring distribution has mean p and variance 
o?. Assume Xp = 1 and let M, = p"X,, be the associated martingale. 
If uw < 1, we know that extinction occurs with probability one and hence 
M, — My = 0. In this case E(M,.) # E (Mo). In Section 5.4, we noted that 
M,, is uniformly integrable if 4 > 1, and hence M,, is a nontrivial random 


variable with E(M,.,) = 1. 


Example 2. Let X), X2,... be independent random variables with P{X; = 
1} = P{X; = —1} = 1/2 and let M,, be the random harmonic series 


— 
jai 


It was noted in Section 5.4 that M,, is a uniformly integrable martingale. 
Hence M,, approaches a random variable M,,. This says that the random 
harmonic series converges with probability one. 


Example 3. Let M,, be the proportion of red balls in Polya’s urn. In this 
case, suppose that at time n = 0 there are k red balls and m green balls (so 
after n draws there aren +k+™ balls). Since M,, is bounded it is easy 
to see that M,, is a uniformly integrable martingale and M,, approaches a 
random variable M,, with E(M.) = E(Mo) = k/(k +m). It can be shown 
(see Example 7 below) that the distribution of M, is a beta distribution with 
parameters k and m, i.e., it has density 


k+m-—1)! _ cee 

eae ‘(1 — x) : 0 <2 -< |. 
Example 4. Let M,, be a martingale with respect to Xo, _Xj,..., and let T 
be a stopping time with P{T < co} = 1. Let JT, = min{n,T} and Y, = M7,,. 
Then Y, — Y, where Y, = Mr. As we saw in the optional sampling 
theorem, it is not always the case that E(Y,.) = E(Yo). However, this is true 
if M,, is uniformly integrable. 


Example 5. Let X, be an irreducible Markov chain on a countably infinite 
state space S with transition function p(x, y). A function f is called harmonic 
at x if 


f(x) = S— p(x, y) f(y). 
yes 


In Chapter 2 we considered the problem of determining whether or not the 
chain was recurrent. We now prove one of the assertions we made there. 
Suppose z is a fixed state in S and let u(x) denote the probability starting at 
state x that the chain ever reaches state z. In other words, if 


T =min{j >0: X; = z}, 
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then 
ile) SPIT < oc | Xo Says 


As we noted then, u(z) = 1 and u(x) is harmonic at any x # z. 
Suppose now that we can find some function v that satisfies: 


wz) Sh, (5.13) 


0< v(x) <1, (5.14) 


v(x) =) p(a,y)v(y), 2 #2. (5.15) 


yes 


If T' is defined as above, and T,, = min{n,T}, one can check that M, = 
v(Xr,) is a martingale with respect to Xo, X1,.... Since v is bounded, M,, 
is uniformly integrable and 


lim M, = Mo, 


T— CO 


exists with E(M,.,) = E(Mo). 

If the chain is recurrent, then P{T < co} = 1 and M, = v(z) = 1. Hence 
if Xo = z, 1 = E(Mo) = v(x). Thus, if the chain is recurrent there is no 
nontrivial solution to equations (5.13) through (5.15). 


Example 6. Let X 1, X2,... be independent random variables with 


Let Mo = 1 and for n > 0, let M, = X,-:-:-Xy. Note that E(M,) = 
E(X,)---E(X,) = 1, and in fact, if F,, denotes the information contained in 


ee, oe 
Ei(Mn41 | Fay a E(X, tae Keay | Fn) 
= Xy° KX seg | Fn) 
= Xie Ay (X47) SM 
Hence M,, is a martingale with respect to X,,X2,.... Since E(|M,|) = 


E(M,,) = 1, the conditions of the martingale convergence theorem hold and 
hence 


M, - Ms 
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for some random variable M,,. Is M, uniformly integrable? The answer is no; 
in fact, the limiting random variable M,, = 0 [and hence E(M,.) 4 E(Mp)]. 
To see this, consider the logarithm of the martingale, 


InM, = S- bole, Ge 
j=l 
The right-hand side is the sum of independent identically distributed random 
variables with mean 
1, 1 3 


1 
E(InXi)= 5ln> +55 < 0. 


By the law of large numbers, In M,, — —oo and hence M, — 0. 
Note in this case 


E(M,) = E(X;)---E(X;) = (5/4)", 
so the second moment is not uniformly bounded. 


Example 7. A typical problem in statistics is to estimate the mean 6 of a 
distribution given independent samples 


Vigo, 1355 .. 


from the distribution. In Bayesian statistics, the parameter 6 is taken to be 
a random variable with a certain distribution, called the prior distribution. 
Assume that E [6] = yw under the prior distribution. Let Mp = uw and 


M, = E[6|¥i,--- ,Yal. 


Then M,, is a martingale. The conditional distribution on M, given Y, = 
Y1,---,Yn = Yn is called the posterior distribution. ‘The martingale conver- 
gence theorem tells us that 


lim M, = Mao; 


T— CO 


for some random variable which depends on the infinite sequence of values 
{Y1, Yo,...}. Moreover, it can be shown that M, = E|M.|¥V1,...,Yn]. The 
strong law of large numbers tells us that for fixed 6, 


7 Vise 
im ——————— _ = 


n—0o n 


0. 


That is, the random variable 6 can be determined from the infinite sequence 
of values. This gives M,, = 8, 


lim E[0|¥i,...,¥n] =. 


T— CO 
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As an example assume that Y,,Y2,... are independent samples from a 
Bernoulli distribution with P{Y; = 1} = 1— P{Y; = 0} = @. If we have 
no a priori knowledge about 6 we might assume that @ is a random variable 
uniformly distributed on [0,1]. For fixed @, 


P{Y, +---+¥_ =k} = 4 g* (1 — @)r-*. 


Let f,(@ | &) denote the conditional density on @ given Y; +---+ Y, = n. 
Bayes theorem shows that 

ete... Aas)! 
fo (“)O8 (1 —01)"-* do, KI (n — ky! 
This is called the beta distribution with parameters k +1 andn—k+1. A 


straightforward calculation shows that the mean of this distribution is (k + 
1)/(n + 2). Note that 


fr(8 | k) = ae @ eee) ama 


1 
0 


} k+1 
=| @ fn(0 | k) db = ———. 


If we let Y,, +1 represent the number of red balls in an urn and (n—Y,,)+1 the 
number of green balls in the urn, we have exactly the transitions for Polya’s 
urn. 


9.6 Maximal Inequalities 


If Mo, M1, Mo,... is a sequence of random variables, define the mazimum 
processes by 
M,y = max{Mo,...,Mn}, M* = max{|Mol,... ,|Mn|}. 


Maximal inequalities relate probabilities or expectations for M,, M* to those 
for M,, or |M,,|. We give two examples here, the reflection principle and the 
Doob mazimal inequality. The basic ideas of the proofs is the following: if M,, 
is a martingale or a submartingale and M,; is large for some j < n, then there 
is a good chance that M,, will be large as well. Stopping times are used to 
make these arguments precise. 


Reflection Principle. Suppose X1,X2,... are independent random vari- 
ables whose distribution is symmetric about the origin. Let Mp = 0,My, = 
X,+:--+X,. Then for every a > 0, 


P{M,, >a} < 2P{M, > a}. 
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To prove this, let T’ be the smallest 7 such that M; > a and note that 


P(M,, > a} =) P{T = 3}, 


j=0 

and 
P{M, >a}= ) P{T =j,M, > a} 
j=0 
=) P{T = j} P{M, Z| Pa 7k. 

j=0 
However, independence and symmetry of the distribution of X1, X2,... show 
that 


P{M, >a|T=j} > P{M, — M; 20|T=3} 
1 
= P{M, — M; >0}> 5° 
Doob’s Maximal Inequality. Suppose Mo,M,, Mo,... 1s a nonnegative 


submartingale with respect to F,. Then for every a > 0, 


E|M,| 


P{M, >a}< 


This inequality can be considered as a generalization of the inequality 


P{M, >a} < : bail 


To prove the maximal inequality, we again let T’ be the smallest 7 with M; > a 
and let A; denote the ¥;-measurable event {7' = j}. Since M,, is nonnegative 
we can write 


E[M,] > E[M, [{T <n}] = SE (M,, Ia,], 


where J denotes the indicator function. However, since A; is #;-measurable, 
properties of conditional expectation can be used to see that 


E[M, I4,] = E[E(Mz Ia, | F;)] = E[E(Mn | F5) La, ] 
> E[M; Ia,] 
> E[aI,,| = a P(A;). 
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Therefore, 


E|M,]| > ; a P(A;) = aP{M,, > a}. 
j=0 


If Mo, Mj,... is a martingale with respect to F,,, not necessarily nonneg- 
ative, we cannot apply this inequality immediately. However, if r > 1, and 
EK [|M,,|"] < 00 for all n, then |M,,|" is a submartingale. To check this we need 
only establish the following fact about conditional expectations: if r > 1, 


EUIY |" | Fn] 2 | EY | Fa] |’, (5.16) 

for then 
E||Mn+il" | Fn] > | E(Mn+1 | Fr) |" = [M,,|". 
Also, if E[e*] < oo, then 
Ele | F).2ee (5.17) 
and hence for every }, 
E[e?Mn+1 LF > eH (OMn4+1|Fn) —_ eo Mn 

This shows that e°“ is a submartingale, assuming E[e?™”] < 00. We leave 


the derivation of (5.16) and (5.17) to Exercise 5.3, but we state the conclusion 
here. 


Doob’s Maximal Inequality. Suppose Mj, M1, Mo,... is a martingale with 
respect to F,. Then for everya,b>0 andr > 1. 


P{|M,,| >a} < EUMal" 


a” 
P{M,, >a} < ——. 


Example. Let S, = X; +---+X, denote simple random walk in Z, and let 
b=1/,/n. Since S,, is a martingale, we get 


P{max{S),...,S,}>aV/n} < e~¢E [e%/V). 
But, 
E [eSn/V"] = BE [e(Xit-t+Xn)/ vn) 


1/Vn 4 p-l/vn\" 
= (eer = (SEE =! | 
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Taylor series shows that 


=1+——+0 


2 Qn €3) 


Therefore, 


1 oti 
lim E[e°r/¥"] = lim f+ +0(5 )| = e)/?. 


n— Oo nm— Oo n2 
Hence, there is a C' < oo such that for all n sufficiently large and for all a > 0, 


P{max{Si,...,S,} >aV/n}<Ce*. (5.18) 


5.7 Exercises 


5.1 Consider the experiment of rolling two dice. Let X be the value of the 
first roll and Y the sum of the two dice. Find E(X | Y), i.e., give the value 
of E(X | Y)(y) for all y. 


5.2 Suppose that X; is a Poisson process with parameter \4 = 1. Find 
E(x, | X2) and E(X2 | X}). 


5.3 A function f : R — R is convez if for every 0< p< landz<y, 


f(px + (1—p)y) < pf(z) + (1 — p) f(y). 


a) Show that if f’ (2) > 0 for all x, then f is convex. 

b) Show that if r > 1, then f(x) = |x|" is convex. 

c) Show that if b is a real number, then f(x) = e®” is convex. 

d) Show that if f is convex; pi,...,Pn are nonnegative numbers summing 
to 1; and 7),... ,2Z» are real numbers, then 


( 
( 
( 
( 


Tr Tr 
> a Gs) 
j=l j=l 


(e) Establish Jensen’s inequality: for any random variable X, E[f(X)] > 
f(E[X]), assuming the expectations exist. 

(£) Show that if Y is a discrete random variable and X is as in (e), then 
E(f(X) | Y) => f(E(X) | Y). (Note: this fact can then be established for Y 


that are not discrete by a limit process. ) 
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5.4 Let X,, Xo, X3,... be independent identically distributed random vari- 
ables. Let m(t) = E(e'*!) be the moment generating function of X, (and 
hence of each X;). Fix t and assume m(t) < oo. Let So = 0 and for n > 0, 


Sn = Xpt--4+ Xn. 


Let M, = m(t)~"e. Show that M, is a martingale with respect to 
Pere, © pera 


5.5 Let Xo, X1,... be the values of a branching process as in Chapter 2, 
Section 2.4, i.e., X, gives the number of individuals in the nth generation. 
Suppose that the mean number of offspring per individual is w. Show that 
M, =p "Xy is a martingale with respect to Xo, X1,... 


5.6 COMPUTER SIMULATION 

(a) Consider the Polya urn model. Simulate this model with a computer by 
starting with one red and one green ball and continuing until the number of 
balls in the urn is 1000. Note the fraction of red balls in the 1000 balls. Do 
this simulation at least 2000 times and note how many times the fraction of 
red balls is in the intervals [0, .05), [.05,.1),... ,[.95,1). From the simulation 
data, make a conjecture as to what the distribution of the fraction of red balls 
looks like. 

(b) Do another simulation of the Polya urn model. Again, start with one 
red and one green ball and continue until there are 1000 balls in the urn. Note 
the proportion of red balls at this time and then continue until there are 2000 
balls. Compare these two numbers (i.e., compare Mgg9g and Mjg9g). Do this 
at least 100 times. 


5.7 Consider a biased random walk on the integers with probability p < 1/2 
of moving to the right and probability 1 — p of moving to the left. Let S, be 
the value at time n and assume that So = a, whereO0<a< N. 

(a) Show that M,, = [(1 — p)/p]*" is a martingale. 
(b) Let T be the first time that the random walk reaches 0 or N, ie., 


T = min{n: S, =0or N}. 
Use optional sampling on the martingale M,, to compute P{S(T) = 0}. 


5.8 Let S, be as in Exercise 5.7. 
(a) Show that M, = S, + (1 — 2p)n is a martingale. 
(b) Let T be the first time that the random walk reaches 0 or N, i.e., 


T = min{n: S, =0or N}. 


Let T, = min{n,7T} and let Z, be the martingale Z, = Mr,. Show that 
there exists a C < oo such that E(Z2) < C for all n. You may wish to use 
Exercise 1.7. 
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(c) Apply the optional sampling theorem to E(Mr) and use this and the 
result from Exercise 5.7 to find the expected number of steps until absorption, 
E(T). 


5.9 Suppose X, is an irreducible Markov chain on finite state space S with 
transition matrix P. Suppose A is asubset of S and F: A— Randg: S\A— 
R are given functions. Let T = min{n : X, € A} and T, = min{n,T}. 
Suppose f : S — R is a function satisfying: 


(a) Show that 


is a martingale. 
(b) Use optional sampling to conclude that 


T—1 


f(z) =E | F(Xr) - S> 9X5) | Xo =2 
j=0 


(Hint: Exercise 1.7 could be useful.) 


5.10 Let S,, be as in Exercise 5.7 and let F,, denote the information in 


Sisk icin Let 
Sn /2 
1 hase 
teeta 
[4p(1 — p)]”/ Pp 


(a) Show that M,, is a martingale with respect to Fy. 

(b) Show that M,, S,, is a martingale with respect to Fy. 

(c) Suppose that R, is a process such that Ro = Mo and both R, and 
R,, Sp, are martingales with respect to F,. Show that R, = M,, for all n. 


5.11 Let X, be the number of individuals in the nth generation of a branch- 
ing process in which each individual produces offspring from a distribution 
with mean p and variance o*. We have seen previously that M,, = p-"X,, is 
a martingale. 

(a) Let F,, denote the information contained in Xo,..., Xn. Show that 


E( Xp 41 | Fn) = pox. tO Ky 
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(b) Suppose pz > 1. Show that there exists a C < oo such that for all n 
IE (MeV C: 
(c) Show that this is not the case if p < 1. 


5.12 Consider the Polya urn problem. Let M,, be the proportion of red balls 
after n draws (starting with one red and one green ball). Prove by induction 
on n that 


Py Ma = a} = : » eS 2ence ate. 
nm+2 


5.13 Suppose X,, X9,... are uniformly integrable with X,, — Y with prob- 
ability one. Show that E(X,) — E(Y). 


5.14 Let X,, Xo,... be independent, identically distributed random vari- 
ables taking values in {—1,0,1,...} with mean p < 0. Let So = 1 and for 
n> OQ, 


Sy, =14+X,+---+ Xn. 


Let T = min{n : S, = 0}. By the law of large numbers, we know that 
P{T < co} = 1. Show that E(T) < 1/|y|. [Hint: it suffices to prove for 
each n, if T, = min{n,T}, then E(T,) < 1/|u|. Consider the martingale 
M, = Sn — nu.| Exercise 5.16 below can be used to prove that E(T) = 1/|p!. 


5.15 Let M, be a martingale with respect to F,. Assume there exists a 
nonnegative random variable Y with E(Y) < oo and |M,,| < Y for all n. 
Show that M,, is a uniformly integrable martingale. 


5.16 Let X,, X2,... be independent, identically distributed random vari- 
ables with mean p. Let T be a stopping time with respect to X1, Xo,... with 
E(T) < o. 

(a) Let 


Y=) |XalI{T > n}, 


n=] 


where J denotes the indicator function. Show that E(Y) < oo. 
(b) Let 7, = min{n, 7} and 


Mp2 eee Xe = Ts 


Explain why M,, is a uniformly integrable martingale (see Exercise 5.15). 
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(c) Prove Wald’s equation, 


e (So = yE(T). (5.19) 


(d) Suppose {F,,} is a filtration such that X,, is F,-measurable and for 
m > n, Xm is independent of F, (i.e., Xm is independent of every F,- 
measurable random variable). Suppose that T is a stopping time with respect 
to {F,}. (In other words, more information than Xj,...,Xn is used to 
determine whether to stop at time n. However, any additional information 
used is independent of Xn+41, Xn+2,-..). Show that (a) through (c) still hold. 


5.17 Let S, be simple random walk in Z. 
(a) Show that for every @ > 0 there is a C’g < oo such that for all positive 
integers n and all a > 0 


P{max{S),...,5,} >aV/n} < Cge%. 
(Hint: follow the derivation of (5.18) using b = G/,/n.) 
(b) Show that for every c > 0, 


S— P{Sn > c/n logn} < oo. 
n=1 


(c) Use this to show that with probability one, 


lm ——-2—_ = 
noo /n(logn) — 
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Renewal Processes 


6.1 Introduction 


Let T,,7>,... be independent, identically distributed, nonnegative random 
variables with distribution function F(x) = P{T; < x}. We will think of the 
random variables 7; as being the lifetimes of a component or as the times 
between occurrences of some event. The renewal process associated with T; 
is the process that counts the number of events that have occurred by time 
t. More precisely, the renewal process N; is defined by N; = 0 for t < JT, and 
otherwise 


N, = max{n:T, +---+Typ < t}. 


We are assuming that at time 0 we are at the beginning of a lifetime. Some- 
times we will consider a slightly more general process where the process at 
time 0 is in the middle of a lifetime. We let Y be a nonnegative random 
variable independent of T7,,7>,..., with perhaps a different distribution. We 
think of Y as the time until the first event, and then the waiting times for 
later events are given by the T;. More precisely, we set N; = 0 for t < Y; and 
fort > Y, 


Ne=min{n: Y+7,+---+ 7, > t}. (6.1) 


We will assume that the random variables 7; have finite, positive mean and 
we set 


pp = E(T;). 


Example 1. Poisson Process. Consider the Poisson process with rate 
parameter A. The waiting times 7), 7>,... are independent, exponential ran- 
dom variables with parameter A and N; is the Poisson process. In this case 


jpo= ly: 


Example 2. Let X, be an irreducible, positive recurrent, discrete-time 
Markov chain starting in state x. Let 


PS ming ny S05. X= 2). 
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and for 7 > 1 let 
Ls = min{n Si: O,G en tanner Er = ar 


In other words, T; measures the amount of time between the (7 — 1)st return 
and the ith return to state x. In general it is difficult to determine the 
distribution function F' for JT; given the transition matrix for the chain. We 
noted previously [see (1.11)] that 


where 7 denotes the invariant probability measure for the chain. If we instead 
start the chain at some state y #4 x we can define 


Y =}minin > 02% = st}, 


T; =min{n > 0: Xy4n = Z}, 
and recursively, 


1, = mini > 0s Avan cag say Hs}. 


Example 3. Let X; be an irreducible, positive recurrent, continuous-time 
Markov chain starting in state x. Define 


Ry =inf{t>0: X; alae 


Sy Sint Shy = 2), 


T, = Ri +S, 
and in general 


R; = inf {t >O0: XT, 4--+T,_14+t a Li, 


= inf {t > 0: DG ree ty gam ees ae = ime 


T; = Ro + Sj. 


The random variables R; are exponential with parameter a(x), the rate at 
which the chain is changing from state x. The distribution of the S;, and 
hence the 7;, is not so easy to determine. 
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Example 4. M/G/1 Queue. Suppose we have a queue with a single server. 
Customers arrive according to a Poisson process with rate A, i.e., the waiting 
times between customer arrivals are independent exponential random vari- 
ables with parameter ’. We will assume that the service times for customers 
are independent, identically distributed random variables with mean p. How- 
ever, we will not assume that the service times are exponential (in most cases 
of interest one does not expect that the service time should have the “loss of 
memory” property so an exponential distribution is not appropriate). The G 
in M/G/1 stands for “general” (service distribution). 

If we let Y; denote the number of people in the queue at time t, then Y; 
is not a Markov process. However there is a natural renewal process one can 
associate with the queue. Suppose Yo = 0. Let 


Ayp=intit > 0. ¥,=1} 
S, = inf{t > 0: Yr,+2 = 0}, 


Ty = R, a Sj. 
Similarly, we define for 7 > 1, 


R; = inf {t = 0< YT ee Gg 44 = Be 
S; = inf {t >0: VP tip RA = 0}, 


T; = BR: 4+ Sj. 


Note that the variables R; are exponential with rate A, but the distribution 
of the S; can be very complicated. Nevertheless, under the assumption that 
E(T;) < oo, we can see that T;,7T2,--- satisfy the conditions for a renewal 
process. We can think of the time represented by the R; as the “idle times” 
and the time represented by the S; as the “busy times.” 


Suppose we have a renewal process N; corresponding to the random vari- 
ables 7),7>,.... In general, Nz is not a Markov process; in order to predict 
when the next occurrence will happen we need to know when the last occur- 
rence took place. For this reason it is natural to consider the “age process” 


A — L if Nz = 0, 
‘ )t—[T1 +--+ +Ty,}, if Ni > 0. 


The process (N;, Ay) can be thought of as a Markov process. The Poisson 
process is a special example of a renewal process that 7s a Markov process; 
for the Poisson process the probability of an event occurring in the interval 
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[t,t + At] is independent of A;. This follows from the “loss of memory” 
property associated with the exponential distribution. 

Our first result for renewal processes will be the analogue of the (strong) 
law of large numbers. Recall that the law of large numbers states that with 
probability 1, 

RS ots ees ee 
lim ————————. = 
nm— CO Tl 
In terms of the renewal process N; this states that for all « > 0, if n is 
sufficiently large, 


IN eis) <1, 


Nun(+e) =n. 
Equivalently, for all « > 0, if t is sufficiently large, 


t 


IN ee 
"~ p(1 —€) 


IA 


Ny 


IV 


This gives the following. 


Law of Large Numbers. With probability one, 


N 
lim —=-. (6.2) 


We now derive a central limit theorem for renewal processes. Assume that 
the variance of each T; is ¢* < oo. Then the usual central limit theorem states 
that the distribution of 


Tyt+--+ +7, -— np 


avn 


approaches a unit normal (i.e., a normal random variable with mean 0, vari- 
ance 1). Slightly more informally we can say that for large n 


Ti +-:°+T, © nptoVnB, 


where B is a unit normal. This states that the number of occurrences in time 
np+o/nB is n. From (6.2), we would expect the number of occurrences in 


Renewal Processes 135 


the time interval of size o./n|B| to be about o,/n|B|/p. Hence we have the 
number of occurrences in time ny is about 


ial B 
m 


If we write t for nu and note that —B is also a unit normal random variable 
we see that 


t or 
Nxt +2 ViB, 
pe pel 
where B is a unit normal. While this is only a rough sketch, this argument 
can be made rigorous, giving a central limit theorem for renewal processes. 


Central LimitTheorem. If the waiting times T; have mean pu and variance 
o*, then as t — oo the distribution of 


N; a aot 
Oo y3/2/t 
approaches a standard normal distribution. 


Example 5. This kind of informal reasoning can be applied to more com- 
plicated examples. Suppose we have a continuous-time Markov chain X; on 
state space {1,2} with a(1,2) = a; and a(2,1) = ag. Assume Xo = 1 and 
let Y; denote the amount of time spent in state 1 up to time f, 


t 
Y; af MGS ids: 
0) 


Define R; and S$; as in Example 3 above (with x = 1). The random variables 
R, are exponential with rate a; and hence have mean f4, = 1/a, and variance 
o? = 1/a%. Similarly the random variables S$; are exponential with mean 
U2 = 1/azg and variance of = 1/a%. For large n the central limit theorem 


states that 


Rote + Ra rn +oiVnBy, 


Sy t---+S, & nye + o2/nBo, 


where B, and Bo are independent unit normals. In other words, in time n(4+ 
t2) + ./n(o, By +.02B2), the amount of time spent in state 1 is approximately 
Np, + ./no,B,. For large t, the amount of time spent in state 1 in an interval 
[t,t + At] is about At[j1/(441 + Me2)|. Hence the amount of time spent in state 
1 up through time (j41 + p2)n is approximately 


fils A/a By = — no By ee Bs) 
Hi + fe 
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n 
= Ny + ae: — 1102Bzy]. 
Hi + pe 
Since B, and Bo are independent, we can write this as 
01/2 02/1 
np + Vn, |(——)2 + (——— 2B = —n+ vIn 
Oe rarer Vee Qty rer 


where B is a unit normal. If we let t = (uw + 2)n we see that the distribution 
of 


tae a1 a t 
at 
approaches a unit normal where 
2 201 Q2 
(ay + a2) 


6.2 Renewal Equation 


We are interested in the large-time behavior of renewal processes. Assume 
we have a renewal process with waiting times 7),7>,... with mean p as 
defined in the previous section. For T > 0, we let U(t) be the expected 
number of occurrences up through time t, where for convenience we will say 
that an event occurs at time 0. In other words, 


U(t) = E(N; +1). 


Renewal Theorem I 


lim cA) = . (6.3) 
t— oo L pL 

This is almost a consequence of (6.2); one does need to be a little careful, 
however, because it is possible for random variables to converge without the 
expectations converging. We leave the derivation of (6.3) from (6.2) to the 
exercises (Exercise 6.5). 

To analyze the large-time behavior of renewal processes we will need a 
second, stronger version of the renewal theorem. The second renewal theorem 
can be thought of as a “derivative” form of (6.3) or as a statement that the 
renewal process converges to a steady state. The second renewal theorem 
states that under appropriate hypotheses, for every r > 0, 


lim Ut¢+r)-UW®=-, (6.4) 
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i.e., for large t, the expected number of renewals in any interval of length r is 
about r/p. It is not too difficult to see that some restrictions must be put on 
the distribution for (6.4) to hold. For example, if the waiting times T; take 
on only integer values, then for every integer n, 


1 
U(n) =U(n+5), 
since renewals occur only at integer times. It turns out that this is really the 
only thing that can go wrong. We say that a nonnegative random variable X 
has a lattice distribution if there exists a number a such that with probability 
one the value of X lies in 


Lakh 02st: 


and we call the smallest such a the period of the distribution. Otherwise we 
say the X has a nonlattice distribution. We now state the second renewal 
theorem. 


Renewal Theorem II. Jf 7,,72,... have a nonlattice distribution, then for 
every Tr > O, 


Jim U(t+r)—U(t) = -. 
If the T,,T2,... have a lattice distribution with period a, then 
lim U((n + 1)a) — U(na) = - 


T— CO 


We will not give a proof of the nonlattice form of this theorem, but rather 
will concentrate on showing how it is used. In the next section we will relate 
the lattice form of this theorem to known results about positive recurrent 
Markov chains. Let F’ denote the distribution of T;. Recall that the convo- 
lution of two distributions F,G of nonnegative random variables is defined 


by 


Feat = | F(t—s) dG(s) = [ G(t — s) dF(s). 


The convolution F'« G gives the distribution function of the sum of two inde- 
pendent random variables with distribution functions F' and G respectively. 
Let F be the distribution function for the T;. We will write F(™ for the con- 
volution of F' n times, i.e., for the distribution function of 7, +---+ 7,,. For 
convenience we will write F) for the trivial distribution function associated 
to the random variable which is identically 0. Recall [see (1.13)] that if Y is 
a random variable taking values in the nonnegative integers, then 


E(Y)= Ply > n}. 


n=1 
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Using this, we can write the renewal function U(t) as 


U(t)=E(N +1) = 14+ SPUN, > n} 


n=1 


=1+S°P{Ti +---+Tn < t} 


n=1 
=> FM), 
n=0 
Let A; denote the time elapsed since the last renewal, 


A - t if N, = 0, 
 )t-(T +---+7,), if Ne =n. 


If we think of the times 7; as being lifetimes of some component, then A; 
represents the age of the current component. We would like to determine the 
steady-state distribution of A;, i.e., we would like to determine for each z, 
W(x) = lim P{A; < z}. 
t—oo 
We will condition on the first renewal. One way for A; to be less than z is for 
no event to have occurred up through time ¢ and t < x. This corresponds to 
t < T, and has probability 1 — F(t) if t < x. If the first renewal has occurred 


before time t, at time s say, then the renewal process starts over and there is 
time t — s left until time ¢t. From this we get the equation 


P(A <2} = lpai(t)[1- FO] + f PlArs <a} dF(s). (6.5) 


Here 1jo,)(¢) denotes the function that equals 1 for 0 < t < x and equals zero 
otherwise. If we let ¢(t) = f(t, x) = P{A; < x}, then this becomes 


O(t) = loa\(t) [1 — F()] + | b(t — 8) dF(s). 


This is an example of a renewal equation. We will now consider solutions 
to renewal equations of the form 


t 
a(t) =n(t) + [a(t s) dF(s), (6.6) 
0 
or in the language of convolutions, 


b(t) = h(t) + 6 * F(t). 


Renewal Processes 139 


We will need the associativity property for convolutions: if F’ and G are 
distribution functions 


(b* F) x G(t) = o*(F * G)(t). (6.7) 
Let us derive this in the case where F and G have densities, so that dF (t) = 


f(t) dt and dG(t) = g(t) dt. In this case 


($* F)*G(t) = / (6 « F)(t — s)g(s) ds 


0 


_ : | [t= 5-1) f(r) ir g(s) ds 


-[ [[ ot-wiw-s) dy) gs) ds 
= [ oe-y) [su = s)ats) as} ay 


= [oer +a) a 
= @*(F*G)(t). 
Here (f * g)(y) = (d/dy) (F' * G)(y) denotes the density of the sum of two 


independent random variables with density f and g, respectively. 

We will first show that there is only one solution to (6.6) in the sense that 
there is at most one ¢(t) that satisfies (6.6) with ¢(t) = 0 for t < 0 and such 
that for each t there is a number M = M; < oo with |d(s)| < M for all 
0 <s<t. Assume there were two such solutions, ¢)(t) and ¢o(t), for a given 
h. Then w(t) = 1(t) — o(t) satisfies |y(s)| < 2M, 0< 5s <t, and 


w(t) = / b(t — s) dF(s). 


If we iterate (6.7) we see for each n, 
t 
i / w(t — s) dF(s), 
0 
But, 


Hw(t)| = [ we- 9) ar(a)| <aMPm(n, 


For fixed t, F‘)(t) + 0 as n — oo. This shows that ~(t) = 0. 
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Now that we know there is only one solution, we need only produce a 
solution. Let 


g(t) = / h(t — s) dU(s) = >| h(t — s) dF“ (s) 
n=0 


= h(t) + 3 [ h(t — s) dF‘) (s). 


Then one can see, using (6.7), that this satisfies (6.6). This therefore gives 
the unique solution. 

Let us now assume that the F is a nonlattice distribution. Another way of 
stating the second renewal theorem is to say that for large s, 


dU(s) + uw ‘ds. 
If h(t) is a bounded function with f-” |h(t)| dt < oo, then this implies that 


t t 


lim h(t — s) dU(s) = lim — | h(s) dU(t—s)= -f- h(s) ds. (6.8) 


t—oo 0 t— oo 0 


Since the age distribution A, satisfies (6.5), we can conclude that the large- 
time age distribution function V,4(x) is given by 


Va(z) = lim P(A, < 2} = = | ~ Lo.ai(s) [1 — F(s)] ds 


1 4 i 
= a [1 — F(s)] ds 
Lt Jo 
Note that 
1 CO 
lim W,(xr) = “ff [1 — F(s)] ds 
lcs Lt Jo 
1 CO CO 
= =f dF (r) ds 
Lt Jo S 


so this gives a valid distribution function. It has density 
1 
wa(x) = V(x) = —[1-F(x2)], 0<24< oo. 
jl 
Example 1. Suppose that the waiting times are exponential with rate A, so 
that F(t) =1—e-** , 4 =1/X. Then 


1 4 0 
Wa(xr) = jim P{Ay ee “| e~*8 ds =1—e7**. 


Renewal Processes 141 


Hence the large-time age distribution for a Poisson process with rate A is an 
exponential distribution with rate 4. This is very plausible: at a large time 
t, the age A; is the amount of time in the past one must go to see an event. 
This reverse process also looks like a Poisson process, so the time until an 
event should be exponential. 


Example 2. Suppose that the waiting time distribution is uniform on [0, 10] 
so that F(t) = (t/10) ,0 <t< 10, and wy = 5. Then the age A; is always less 
than 10 and for x < 10, 


2 


: 1 [* G2 


Note in this case (as in essentially all cases but for exponential waiting times) 
the large-time age distribution is not the same as the waiting time distribution. 


We will now consider two other processes, the residual life 
B, = inf{s: Niis > Ne}, 
and the total lifetime 
C, = A; + B:. 


The residual life gives the amount of time until the current component in a 
system fails. Consider P{ By < x}. There are two ways for B; to be less than 
x. One way is for there to be no renewals up to time t and B, < x. This 
corresponds to t < T, < t+ <2 which has probability F(t + x) — F(t). The 
other possibility is that there is a first renewal at time s < t¢ in which case we 
need to consider {B;_, < x}. This gives the renewal equation 


t 
P{B, <2} =([F(t +2) — F(t)| + / P{ By. < x} dF(s). 
0 
The solution to this renewal equation is 
t 
P{B, < x} af [F(t—s+2z) — F(t —s)| dU(s). 
0 


From (6.8), we can determine the large-time residual life distribution function 
Va(z), 
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t 


W B(x) = lim : [F(t —s+2)— F(t —s)] dU(s) 
= jim — : [F(s +a) — F(s)] dU(t — s) 


: 
U 
== | [n- Fe) is- [Fe +e) is| 
Z 
U 


ft — F(s)| ds — ft _ F(r)| ar] 


1 x 
= a [1 — F(s)] ds. 
Ht Jo 

What we see is that the large-time distribution function for the residual life 
is the same as that for the age distribution. If one thinks about this, it is 
reasonable. Consider every lifetime 7T;. For every r,s with r +s = T;, there 
will correspond one time t when A; = r, B; = s and another time u when 
A, = s,B, =r. By this symmetry, we would expect A; and B; to have the 
same limiting distribution. 

Now consider the total lifetime C; and P{C; < x}. One way for C; to be less 
than z is for there to be no renewals up through time t and the total lifetime 
less than x. This corresponds to t < T, < x which has probability F'(x)— F(t). 
The other possibility is that the first event occurs at some s < t in which case 
we need to consider P{C;_, < x}. This gives the renewal equation 


t 


P(Ct <2} =ljoay(t) F(a) ~ FO] + f P{Ce-s < 2} dF Cs) 


By solving the renewal equation and using (6.8), we see that the limiting 
distribution for the lifetime, Vc(x) is given by 


t 


Vo(x) = lim ; lio,2)(t — s) [F (x) — F(t — s)] dU(s) 


t—o0o 


t— oo 


= lim — : lio,z)(s) [F(x) — F(s)| dU(t — s) 


; | Lio,a)(8) (F(a) — F(s)] ds 


; 2P@) = / ” F(s) is | 


This formula is best understood in the case where F' has a density f(t). In 
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this case Vo(x) has density 


bo(2) = Va(x) = f(a). (6.10) 


This can be understood intuitively. Suppose x < y. Then the relative “prob- 
ability” of waiting times of size x and size y is f(x)/f(y). However, every 
waiting time of size y uses up y units of time while a waiting time of size x 
uses up x units of time. So the ratio of times in an interval of size x to an 
interval of size y should be xzf(x)/yf(y). The 1/p can easily be seen to be 
the appropriate normalization factor to make this a probability density. 


Example 3. If the waiting times are exponential with rate A, then uw = 1/A 
and W, and Wz have exponential distributions with rate 4. Note that Vo 
has density 


wo(z) = *2e~**. 


This is the density of a Gamma distribution with parameters 2 and A and is 
the density of the sum of two independent exponential random variables with 
rate A. For large times, the age and the residual life are independent random 
variables. 


Example 4. If F is uniform on [0,10], then u = 5, and Vy, and Wz are given 
by (6.9) with densities 


valz) = vale) =7- =, O<2z< 10. 


Note that the expected age or the expected residual life in the long run is 


given by 
10 
1 x 10 
See) dese 
[=| | a a 


The density of Vc is given by 


1 x 
z)=-2rf(x4)=—, O<2< 10. 
Po(ax) 7 f (2) 50 
It is easy to check that the age and residual life are not asymptotically inde- 
pendent in this case, e.g., there is a positive probability that the age is over 8 
and a positive probability that the residual life is over 8, but it is impossible 
for both of them to be over 8 since the total lifetime is bounded by 10. 


Suppose one is replacing components as they fail and the lifetimes are inde- 
pendent with distribution F’. Suppose we consider the system at some large t, 
and ask how long the present component is expected to last. This is equivalent 
to finding the expected value of the residual life. This is given by 


[ eve) dx = ~f elt- Fe) = af dF (x). 
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The last equality is obtained by integrating by parts. It is easy to give exam- 
ples (see Exercise 6.6) of distributions of densities f(x) such that 


1 CoO 


<— x? f(x) dz. 
an Jp (x) 


jl 
In fact, it is possible for 4 < oo and the expected residual lifetime to be 
infinite. This may be surprising at first; however, a little thought will show 
that this is not so unreasonable. 

We finish this section by describing how to create a “stationary renewal 
process.” Suppose 7), 7>,... are independent with nonlattice distribution F’. 
Let Wp be the large-time residual life distribution and let Y be a random 
variable independent of 7;,7>,... with distribution function Vg. Define N; 
as in (6.1). Then A; looks like a renewal process in steady state. It has the 
property that for every s < t, N; — N, has the same distribution as N¢_s. 


6.3. Discrete Renewal Processes 


In this section we will suppose that the random variables 7;,7>5,... are 
lattice random variables. Without loss of generality we will assume that the 
period a as defined in Section 6.2 is equal to 1 (the period is always equal to 
1 in some choice of time units). Let F be the distribution function for the T; 
and let 


Pn = P{T; =n} = F(n) — F(n - 1). 


We will assume for ease that po = 0; if pp > 0 we can make a slight adjustment 
of the methods in this section (see Exercise 6.10). Since the period is 1, the 
greatest common divisor of the set 


{n: Dn > O} 


is 1. As before set 
p=E(T;) = > npn, 
n=l 


and we assume [Ll < oo. 
Let N; denote the number of events that have occurred up through and 
including time 7, i.e., N; = 0 if 7 < T; and otherwise 


N; = max{n:7T, +---+T), eae 
We can also define the age process A; by A; = 7 if 7 < 7; and otherwise 
Aga g = (Ey oss ns). 
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The key fact is that A; is a Markov chain. Let 


Pn 


Then A; is a discrete-time Markov chain with transition probabilities 
p(n, 0) = An+1; p(n, n 2 1) iL Al 


Let K be the largest number k such that py, > 0 (where K = oo if pz > O for 
infinitely many k). Then A, is an irreducible Markov chain with state space 
{0,1,..., AK —1} if kK < oo and state space {0,1,...} if AK = oo. The chain 
is also aperiodic since we assumed the period of F' is 1. We start with Ag = 0 
and note that the nth return to state 0 occurs at time 7; +---+ 7,. The 
condition E (T;) < oo implies that A, is a positive recurrent chain. 

The invariant probability a for this chain can be obtained by solving the 
equations 


m(n+1) = p(n,n+1) a(n) 


(1 = An+1) m(n) 


—1-F(n+1) 
= Gat mn), n>O, 


™(0) = S— p(n, 0) a(n) = > An+1 7(N 
n=0 n=0 
The first equations can be solved recursively to yield 


n(n) = [1 — F(n)| (0). 
To find the value for 1(0) for which 5) a(n) = 1, we check that 


fore) fee) foe) 
DL - Fl =>) DL Pm 
n=—0 n=0 m=n+1 


In particular, 


Note that 


P{an event at time 7} = P{N; > Nj-1} = P{A; = 0}. 
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Since A; is an aperiodic, irreducible, positive recurrent Markov chain we know 
that 


1 
lim P{A,; = 0} = (0) =-. 
JOO Lt 


This gives the second renewal theorem for discrete renewal processes. 
We have also derived the large-time age distribution, 


wa(n) = lim P{A; =n} =a(n)= 1-F(n) 


jo fn 
Consider the residual life, 
B; = min{k >O0: Nj+k > N;}. 


We can compute the large-time distribution of B;, 


Pp(n) = Pui P{B,; =n} 


S| P{A; =m} P{B; =n| A; =m} 


m=0 


| 
Me Te 


m(m)P{B; =n | Aj =m} 


m=O 
ayn ne Dione 
a. Ye Lee Gn) 
: 
=—)° prim 
ad 
_ be 1) 
mM 


In other words, 
we(n) = lim P{B; =n} = lim P{A; =n—-1} = da(n- 1). 
joo joo 


The residual life has the same large-time distribution as the age except for 
a difference of 1 which comes from the fact that the smallest value for the 
residual life is 1 while the smallest value for the age is 0. For the total lifetime 
of the component at time J, 


Cy = Aj + By, 
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we can compute 


Wc(n) 


| 


joo 
n—-1 


= lim ye PLA; =m} P{C;, =n|A; =m} 
m=0 


Joo 


This is the discrete analogue of (6.10). 


Example 1. Bernoulli Process. The discrete analogue of the Poisson pro- 
cess is the Bernoulli process. Let 0 < p < 1 and let X 1, X2,... be independent 
random variables with P{X; = 1} =1-—P{X,;=0}=p. Nj = %1+---+X; 
represents the number of “successes” in the first 7 trials of an experiment with 
probability p of success. The waiting times 7; have a geometric distribution 


P{T; =n} =(1-p)"“p, n21, 


with = 1/p. The asymptotic age distribution is given by 


OO 


=p >. (1-p)"™"*p=p(1-p)”, 


j=m+1 


_ 1 — F(n) 


a(n) 7 


i.e., the age is one less than a random variable with a geometric distribution. 
The residual life distribution is geometric with parameter p. The asymptotic 
lifetime distribution is given by 


go(n) = np? (1—p)"™, 


which is the distribution of the sum of two independent random variables 
with distributions 6,4 and ¢g, respectively. The age and the residual life are 
asymptotically independent. 


Example 2. Suppose F is uniformly distributed on {1,...,10} with uw = 
11/2. Then 


F(in)=—, n=1,2,...,10. 
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The asymptotic age distribution is given by 


Sen). Wen 


wa(n) a 


TE = Otcg i 


and for large time the residual life distribution is given by 


ties). dian 


’ = 1, ; 10. 
WB(n) fi =e 
The asymptotic lifetime distribution is given by 
n 
Ye(n) = x, n=1,2,...,10. 


In this case, the age and residual life are not asymptotically independent. 


6.4 M/G/1 and G/M/1 Queues 


We will consider Example 4 from Section 6.1. Customers arrive into a single- 
server queue from a Poisson Process with rate 4. Customers are served (first 
come, first served) and the service time is a random variable with distribution 
function F and mean ps < oo. We will call the service rate 1/u, even though 
the service times are not exponential. The service times and the arrival times 
are independent. As mentioned before there is a natural renewal process 
involved where R,, Ro,... denote the amount of time spent in “idle times” 
while S;,S9,... denote the amount of time spent in “busy times.” If the queue 
starts idle, i.e., if Xo = 0 where X; denotes the size of the queue (including 
the person being served) at time t, then the time until the start of the next 
idle time is given by 7; = R, +S; and the time until the start of the (n+ 1)st 
idle time is given 


Ty +--+ +Tn, 


where T; = R; + S;. 

The times R; are exponential with rate A, i.e, with mean 1/A. The dis- 
tribution of the S; is more difficult to determine. However, we will be able 
to determine E(S;). Assume that the service rate is greater than the arrival 
rate, 1.€., 


pA <1. 


Consider the start of a busy time, so that X; = 1. We will consider a discrete- 
time Markov chain Y,, that gives the number of people in the queue immedi- 
ately after the nth person has been served. We start with Yo = 1. The value 


Renewal Processes 149 


Y, is obtained by considering the number of people who entered the queue 
during the first service time and subtracting 1 (for the person who has left the 
queue). For 7 > 1, Y; is obtained by adding to Y;_; the number of people who 
entered the queue while the ith person was being serviced and subtracting 
one. Let 


PSM Wy = 0: 


If U,,U2,... denote the service times of the customers, then the length of the 
first busy time is given by 


S, =U,4+U0o4+---4+U,. 


The U,, Uo,... are independent random variables, each with distribution func- 
tion F’, but the U; are not independent of 7. If we let F,, denote the informa- 
tion in Y;,...,Y, and U,,... ,Un, then 7 is a stopping time with respect to 


{Fr} and Unii,Un+42,... are independent of F,,. If E(7) < 00, then Wald’s 
equation (5.19) implies that 


It was shown in Exercise 5.14 that E(r) < oo if E(Y;) < 0 and in this case 
another application of Wald’s equation can be made to show that 


Let us compute E(Y;). The probability that k people arrive in the queue 
during a service time U; is 


i= / P{k arrive | U; = s} dF(s) 
0 


(o-e) esa s k 
=| oA dF(s) 


The expected number of arrivals is therefore 
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where we write p = 1/ for the service rate. The expected length of a busy 
time is given by 


1 


E(S,) =E(U;) E(r) = Ee 


The fraction of time that the queue is busy is given by 


E(S;)) =A 


E(fi)+E(Si)  p 
Note that this ratio tends to 1 as A — p. 

If \ = p, the chain Y,, can be shown to be recurrent (see Exercise 2.15) so 
that the queue size returns to 0 infinitely often. However, in the long run the 
fraction of time spent with the queue empty goes to 0. If A > p, the chain Y,, 
is transient, and hence the queue size goes to infinity. 

Now let us consider the somewhat less realistic G/M/1 queue. Here cus- 
tomers arrive one at a time with waiting times 7,,75,... having common 
distribution function F' with mean 1/X. There is one server and the service 
times are exponential with rate p. We will assume that the service rate is 
greater than the arrival rate, p > X. 

There exists a natural Markov chain embedded in the G/M/1 queue. Con- 
sider Y,, the number of customers in the system immediately before the nth 
customer arrives. (We will assume that the queue starts out empty and we 
set Yo = 0.) Then Y,, can easily be checked to be a Markov chain with state 
space {0,1,2,...}. 

To compute the transition probability for this chain we first for ease consider 
what happens if there are an infinite number of people in the queue. Let q, 
be the probability that exactly k individuals are served between the arrival 
times of two successive customers. If the arrival time is t, then the number of 
customers served has a Poisson distribution with parameter pt. Hence 


dk = / P{k served | T; = t} dF(t) 
0 


—_ ie lpn). 


The expected number served is 
= = = a t (pt)* 
k=0 k=0 
ae ar, k} 


=| wari) 
0 
=) hls 


dF (t) 
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Now if Y,, = 7, then after the nth customer arrives there will be 7 + 1 cus- 
tomers in the queue. The queue will serve customers until the queue empties. 
It is easy to see then that 


P{¥Yn41 =k | Yn = 9} = @G44)-e a Need = as 


P{Yn41 =O0|Y, =5} = No aGa nek = SW 


k<0 t>j+l 


If we set pp = qi-1, | = 1,0,—1,..., then we see that Y,, has transition 
probabilities 


PI, k) = pr—j; k=1,...,j7 +1, 


p(j,0) = S— prs. 


k<0 


It can be shown (see Exercise 2.16) that this is a positive recurrent Markov 
chain. Its invariant probability is of the form 


m(j) = B (1-8), 


where 7 is the unique solution to 
OO 
B=) G8, 
=p 


with @ € (0,1). It is hard to evaluate @ analytically but it can be computed 
numerically. 


6.5 Exercises 


6.1 Suppose the lifetime of a component T; in hours is uniformly distributed 
on [100, 200]. Components are replaced as soon as one fails and assume that 
this process has been going on long enough to reach equilibrium. 

(a) What is the probability that the current component has been in opera- 
tion for at least 50 hours? 

(b) What is the probability that the current component will last for at least 
50 more hours? 

(c) What is the probability that the total lifetime of the current component 
will be at least 150 hours? 

(d) Suppose it is known that the current component has been in operation 
for exactly 90 hours. What is the probability that it will last at least 50 more 
hours? 
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6.2 Repeat Exercise 6.1 with the 7; exponentially distributed with mean 
150. 


6.3 Repeat Exercise 6.1 with the 7; having density 


1 
an ae t < 200. 
f(t) ao. 00 < t < 200 


6.4 Repeat Exercise 6.1 with the 7; having distribution 


P{T; = 100} = P{T; = 200} = 1/2. 


6.5 Let N; denote the renewal process associated with independent, identi- 
cally distributed random variables 7;,7>,... with mean p. 

(a) Explain why for any positive integers j,k and any t, the following in- 
equality holds 


P{N, > jk} < [P{M > i}. 


(b) The law of large numbers for renewal processes, (6.2), states that for 
every € > 0 
(1 — 
im PY Ox we EY (6.12) 
Ul 


t—+00o i = wv 


Use (a) and (6.12) to conclude that for every « > 0, 
1 t(1 
im, FE (M14 Ne > “CON —o 
too f LL 
(c) Derive the first renewal theorem, (6.3). 


6.6 Assume that the waiting times T; have distribution 


9 1 
P{7;=1}=—, P{T;=1 oe 
ely igh ee a 
Note that the times T; have a nonlattice distribution. 
(a) What is the age distribution Vc(n)? 
(b) For large times, what is the expected residual life? Compare to E (7;). 


6.7 Suppose that there are two brands of replacement components, Brand 
X and Brand Y, and that for political reasons a company buys replacements 
of both types. When a Brand X component fails it is replaced with a new 
Brand Y component and when a Brand Y component fails it is replaced with 
a Brand X component. The lifetimes (measured in thousands of hours) of 
Brand X components are uniform on [1,2] and the Brand Y components have 
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lifetimes that are uniform on [1,3]. Answer the following questions for large 
time t. 

(a) What is the probability that the current component is Brand X? 

(b) What is the distribution of the age of the current component? 

(c) What is the distribution of the total lifetime of the current component? 

(d) Would these answers be different if instead of alternating the brands, 
they used the rule that when a component fails they randomly choose a Brand 
X or Brand Y component with probability 1/2 for each? 


ee 


6.8 Suppose customers arrive in a one-server queue according to a Poisson 
distribution with rate \ = 1 (in hours). Suppose that the service times equal 
1/4 hour, 1/2 hour, or one hour each with probability 1/3. 

(a) Assume that the queue is empty and a customer arrives. What is the 
expected amount of time until that customer leaves? 

(b) Assume that the queue is empty and a customer arrives. What is the 
expected amount of time until the queue is empty again? 

(c) At a large time ¢ what is the probability that there are no customers in 
the queue? 


6.9 Give an example of a renewal process with E[|T;] < oo such that the 
large time residual life distribution has infinite mean. 


6.10 Assume T;,,7>,... are independent identically distributed nonnegative 
random variables with P{T; = 0} = q ©€ (0,1). Suppose the distribution 
function of the 7; is F with mean p, and let G be the conditional distribution 
function of the T; given that the 7; > 0, 


F(z) =F) 


—4q 
Let T,72,... be independent, identically distributed random variables with 
distribution function G and let U(t) and U(t) be the renewal functions asso- 
ciated with the J; and the T; respectively. Show that 


~ 


U(t) = (1— q)U(E). 


Chapter 7 


Reversible Markov Chains 


7.1 Reversible Processes 


In this chapter we will study a particular class of Markov chains, reversible 
chains. A large number of important chains are reversible, and we can take 
advantage of this fact in trying to understand their behavior. 

Suppose we have a continuous-time Markov chain X; taking values in state 
space S$ (finite or countably infinite) with transition rates a(z,y). If 7 is any 
measure on S, i.e., a nonnegative function on S, then the chain is said to be 
reversible with respect to the measure x if for all x,y € S, 


w(x) a(x,y) = wy) aly, 2). 


We will say that the chain is symmetric if for every x, y 


a(x, y) = aly, 2). 


Note that a chain is symmetric if and only if it is reversible with respect to 
the uniform measure 7(z) = 1, x € S. Similarly, a discrete-time Markov chain 
with transition matrix P is said to be reversible with respect to 7 if 


m(x)P(z,y) =7(y) Ply, 2), 


for all x,y € S and symmetric if P(z,y) = P(y,z). In the next two sections 
we will discuss continuous-time chains, but analogous statements hold for 
discrete-time chains. 


Example 1. Let G = (V,£) be a graph as in Example 5, Section 1.1. Let 
S=V and 


a(x, y) = ren (ay) € E, 


where d(x) is the number of vertices adjacent to x. This is a continuous- 
time analogue of Example 5. Then this chain is reversible with respect to the 
measure 7(x) = d(x). If instead we choose 


Ory) = 1, (ry) eZ, 
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then the chain is symmetric and hence reversible with respect to the uniform 
measure. 


Example 2. Let G = (V,E) be any graph and let g: E — [0,co). Sucha 
configuration is often called a network. A network gives rise to a symmetric 
chain with transitions 


a(z,y) = aly, x) = g(e), 


if e denotes the edge connecting x and y. In the study of electrical networks the 
rates g(e) are called conductances and their reciprocals are called resistances. 


Example 3. Suppose we have a birth-and-death chain on S = {0,1,2,...} 
with birth rates A, and death rates y,,. In other words, the transition rates 
are 


a(njn+1)=An, a(n,n—1) = pn. 
Let 7(0) = 1 and for n > 0, 


=. AOATS Aged 


TIN : 
) Hipt2°°* Mn 


Then the chain is reversible with respect to the measure 7. 


Example 4. Let G = (V,E) be any graph and suppose a : V — (0,00) 
is a positive measure on G. Suppose each vertex is adjacent to only a finite 
number of other vertices. Define a(z, y) = 0 if (x, y) is not an edge of G and 
for (z,y) € E, 


Then @ generates a chain that is reversible with respect to 7. 


If a chain is reversible with respect to 7, then 


> ty) a(y,2) = (x) )/ a(z,y) = n(x) a(2), 


yes yes 


i.e., 7 is an invariant measure for a. If the state space is finite, or if the state 
space is infinite with )> a(x) < oo, then we can normalize 7 so that it is an 
invariant probability for a. In particular, if @ is irreducible, we know that if 
a is reversible with respect to a probability measure 7 then 7 is the (unique) 
invariant measure. Conversely, if an irreducible chain is reversible with respect 
toa7m with >) 7(x2) = oo, we can conclude that there is no invariant probability 
measure and hence the chain is null recurrent or transient. 

The reversibility condition is a way of stating that the system in equilibrium 
looks the same whether time goes forward or backward. To give an easy 
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example of a nonreversible chain consider the three-state chain on S = {0, 1, 2} 
with rates 


a(0,1) = a(1,2) = a(2,0) = 1, 


a(1, 0) =0(2,1) =a(0;2)'= 2. 


This is clearly irreducible with invariant probability measure 7(0) = m(1) = 
m(2) = 1/3. If the chain were to be reversible, it would need to be reversible 
with respect to 7, but clearly 


m(0) a(0, 1) £ 7(1) a(1, 0). 


7.2 Convergence to Equilibrium 


It is often useful to give estimates for the amount of time needed for the 
chain to reach a measure close to the invariant probability measure. Let X; 
be an irreducible continuous-time Markov chain with rates a(x, y), reversible 
with respect to the probability measure 7. We will assume that the state 
space is finite, S = {1,... , N}, but one can generalize these ideas to positive 
recurrent chains on an infinite state space. For ease, we will only consider the 
case where A is symmetric (reversible with respect to the uniform measure), 
but these ideas hold for all reversible chains. 

There are a number of ways to measure the “distance” between two proba- 
bility measures 7 and v on S. One very natural definition is the total variation 
distance defined by 


lz — v||ry = max{|7(A) — v(A)|: AC S}. 


It is easy to see that the maximum is obtained on the set A = {x : m(x) > 
v(x)}. Therefore, 


Iz — vllev 


>» (az) - v(2)) 


m(x)>v(x) 
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In the last expression, the 1/N represents the uniform measure on S and 
Na, Nv are the “derivatives” of 7,v with respect to this measure. 

Another measure of distance which is not quite as natural but is sometimes 
easier to analyze is the L? or mean-squared distance, 


1/2 
if 
Jn — vlae = | Na (2) — Noe)? 
res 
Note that ||7 — v||,2 = N'/? || — v|| where || - || denotes the usual Euclidean 


norm in R%. The Cauchy-Schwartz inequality 
| -@| < |oy?”? |ayy’”?, 
gives the inequality 


In — vn2 > 2\|n — vlrv. 


Example 1. Consider the chain with rates a(i,7) = b/N,i 4 7 where b > 0. 
For any 7 the vector v with 


is a right eigenvector with eigenvalue —b. There is an N — 1 dimensional sub- 
space of such eigenvectors; hence the eigenvalues for A are 0 with multiplicity 
1 and —6b with multiplicity N —1. If v is any probability vector, we can give an 
exact expression for e“v. Suppose we start in state x. This chain starts with 
distribution v, waits for an exponential “alarm clock” with rate b (mean 1/b) 
to ring, and then chooses one of the N sites from the uniform distribution. If 
we let a denote the uniform distribution, then 


ef@yva=e*v+(l—e®)x. 


The e~* term denotes the probability that the alarm clock has not gone off. 
Therefore, 


le“Av — \lrv = e7® |v — a|\rv < e7®, 


leAv — r])p2 =e” |v — a] I/22. 
If the chain starts at x, so that v(x) = 1, then ||v — 7||,2 ~ VN, so the L? 
distance is still large. 


Despite it limitations, we will focus on bounding the rate of convergence 
in the L?-distance, because techniques of linear algebra can be used. If A is 
a symmetric matrix, then it can be shown (see an advanced book on linear 
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algebra) that there is a complete set of eigenvalues and eigenvectors. More- 
over, all the eigenvalues are real so we can write the eigenvalues in decreasing 
order, 


O= Ay > Ag 2 AB 2°: SAN. 


We know Ag < 0 because the chain is irreducible. By symmetry, we see that 
if (-,-) denotes inner product, 


N WN — 
(Av, wv) = (0, Aw) = S>S_ v'w A(i, 5). (7.1) 


i=1 j=1 


A matrix satisfying the first equality is said to be self-adjoint (with respect 
to the uniform measure) and the expression on the right is often called the 
quadratic form associated with the matrix. 

Let 


P= U1, Udy ska ,UN, 


be the eigenvectors for A, which are both right and left eigenvectors since A 
is symmetric. Using (7.1) we can see that 


dj (0j,0~) = (AD;, DK) = (Bj, AD) = AK (Oj, Te), 


and hence eigenvectors for different eigenvalues are orthogonal ((v;, Ux) = 0). 
We can therefore choose the eigenvectors so they are all orthogonal. These 
eigenvectors are also the eigenvectors for the matrix e* with corresponding 


eigenvalues e!), 


PA — the 
e Uj =e Uj. 


Let U c R™ denote the N — 1 dimensional subspace generated by the vectors 


{U2,...,Un}, or equivalently, the set of vectors w satisfying 
N 
Sou =0 
i=1 


By writing any w € U as a linear combination of left eigenvectors, we can 
easily see that 


we" |] < ea], 


where ||w||? = }<~, [w*]?. Now suppose we start the chain with any proba- 
bility vector v. We can write 
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where 7 = (1/N)1 is the invariant probability and w = 7 — 7 € U. Since 


mwet® = 7, we can conclude 


\|ae"* — FI)p2 = ||(D— me ||p2 < e? ||7 — FI] z2. 


What we see is that the rate of convergence is essentially controlled by the 
size of Ag, and if we can get lower bounds on |A2|, we can bound the rate of 
convergence. 


Example 2. Consider simple random walk on a circle, i.e., the chain with 
state space S = {1,... ,N} and rates a(z,y) = 1/2 if |x — y| = 1(mod JN). 
This is reversible with respect to the uniform measure on S. The eigenvalues 
for A can be found exactly in this case (see Exercise 7.9), 


(j — 1)2n 
a —__—— ] — ] a) Me ree a 
J cos ( N » J gs ’ 


In particular, 42 = cos(2a/N)— 1 which for large N (by the Taylor series for 
cosine) looks like —27*N~?. This says that it takes on the order of about 
N? time units in order for the distribution to be within e~! of the uniform 
distribution. It makes sense that it takes on order N? steps to get close to 
equilibrium, if we remember that it takes a random walker on the order of N? 
steps to go a distance of about N. 


Example 3. Let the state space S be all binary sequences of length N, i.e., all 
N-tuples (a;,... ,ay), a; € {0,1}. Note that the state space has 2% elements. 
Consider the chain with a(x, y) = 1 if x and y are two sequences that differ in 
exactly one component and a(z,y) = 0 otherwise. This is sometimes called 
random walk on the N-dimensional hypercube. Clearly this is reversible with 
respect to the uniform measure. It can be shown that —2j/N is an eigenvalue 
with multiplicity e ). In this case, Ag = —2/N and it takes on order N steps 
to get close to equilibrium. This can be understood intuitively by noting that 
if the number of steps is of order N, most components have had an opportunity 
to change at least once. 


Now let U be the set of vectors that are orthogonal to 1, i.e., the set of 
vectors w satisfying 


If w € U, then Aw € U. If we write 


W = AQU2 + +++ + AnUn, 
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with a; = (U;, W), we see that 


(a;U;, Qj Av;) 


M2 iM: 


0405; (Vi, 05) 


|| 
Mz 
~) 
> 
pe oe 
te 
=: 
el 
©. 
“” 


2 
< Ag) (a:di, a;0;) = Ao (w, B). 
i=2 


Also, we get equality in the above expression if we choose w = Ug. What 
we have derived is the Rayleigh—Ritz variational formulation for the second 
eigenvalue, 


Lower bounds for Ag (i.e., upper bounds of |A2|) can be obtained by con- 
sidering particular w € U. If T CS, let w € U with components 
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> 
a 
| 


= So Aijw! 
=-a(i)[1-(T)}+ So aGj,i[1-a(T)]- S> aGj,da(T) 


JET j Fi jET 
=— Sali, é)[1 - a(T)] — S° a9, )x(T) 
j¢T JET 
—— es a(j,z). 
j€T 


Similarly, if 7 ZT, 


Therefore, 


(w, Aw) = S- w'(Aw)' 


a 


= ol - a) Sol-eG, i] + SO-a(T)] YE a, 


1EeT JET ~¢T g€ET 


f 
| 
M4 
eS 
= 
S. 


Define k by 
ee Hoe ier Diet a(t, j)m(t) 
TCS m(T)[1—x(T)) 


Then by considering this choice of w in the Rayleigh--Ritz formulation, we 
have 


|A2| < kK. 


Unfortunately this bound is often not very good. A large area of research is 
concerned with finding better ways to estimate A2; we do not discuss this any 
further in this book. 


7.3 Markov Chain Algorithms 


A recent application of Markov chain theory has been in Monte Carlo simu- 
lations of random systems. The idea of Monte Carlo simulations is simple: to 
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understand a random system one does many trials on a computer and sees how 
it behaves. These simulations always use a random number generator, gen- 
erally a function that produces independent numbers distributed uniformly 
between 0 and 1. (Actually, a computer can only produce pseudo-random 
numbers and there are important questions as to whether pseudo-random 
number generators are “random” enough. We will not worry about that ques- 
tion here and will just assume that we have a means to generate independent 
identically distributed numbers U;, U2,... distributed uniformly on [0, 1].) 

As an example, suppose we were interested in studying properties of “ran- 
dom” matrices whose entries are Os and 1s. As a probability space we could 
choose the set S of N x N matrices M, with 


M(i,j)=Oorl, 1<ij<N. 


A natural probability measure would be the uniform measure on all 2% * such 
matrices. Writing an algorithm to produce a random matrix from this distri- 
bution is easy—choose N? uniform random numbers U(i,j), 1 < i,j < N, 
and set 


It takes on the order of N* operations to produce one N x N matrix, and 
clearly every matrix in S has the same chance of being produced. 

Now suppose we change our probability space and say we are only interested 
in matrices in S that have no two ls together. Let JT’ be the matrices in S$ 
with no two ls together, i.e., the matrices M € S such that 


M(i — 1,7) =M(i+1,j) = M(i,j —1) = M(i,j +1) =0, 


if M(z,7) = 1. Suppose also we want to put the uniform probability measure 
on TJ (this is a natural measure from the perspective of statistical physics 
where 1s can denote particles and there is a repulsive interaction that keeps 
particles from getting too close together). While it is easy to define this 
measure, it is a hard problem to determine c(N), the number of elements of 
T. It can be shown that there is a constant @ € (1,2) such that 


lim c(N)MN" = 8 


N-oo 


(so that the number of elements in T is approximately 6% ) but the exact 
value of @ is not known. Still we might be interested in the properties of such 
matrices and hence would like to sample from the uniform distribution on T’. 

While it is very difficult to give an efficient algorithm that exactly samples 
from the uniform distribution (and even if we had one, the errors in the 
random number generation would keep it from being an exact sampling), we 
can give a very efficient algorithm that produces samples from an almost 
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uniform distribution. What we do is run an irreducible Markov chain with 
state space 7’ whose invariant measure is the uniform distribution. We can 
then start with any matrix in 7’; run the chain long enough so that the chain 
is near equilibrium; and then choose the matrix we have at that point. 

For this example, one algorithm is as follows: 1) start with any matrix 
M € T, e.g., the matrix with all zero entries; 2) choose one of the entries at 
random, i.e., choose an ordered pair (7,7) from the uniform distribution on 
the N? ordered pairs; and 3) consider the matrix gotten by changing only the 
(i,7) entry of M. If this new matrix is in T, we let this be the new value of 
the chain; if the new matrix is not in T, we make no change in the value of 
the chain; return to 2). This algorithm is a simulation of the discrete-time 
Markov chain with state space J and transition probabilities 


P(M,M’) = N~?, 


if M, M’ € T differ in exactly one entry; P(M, M’) = 0 if M and M’ differ 
by more than one entry; and P(M,M) is whatever is necessary so that the 
rows add up to 1. Clearly, P is a symmetric matrix and it is not too difficult 
to see that it is irreducible. Hence P is a reversible Markov chain with state 
space T and its invariant distribution is the uniform measure. 

Of course, we need to know how long to run the chain in order to guarantee 
that one is close to the invariant distribution. As noted in the previous section, 
this boils down to estimating the second eigenvalue for the Markov chain. 
Unfortunately, estimating this eigenvalue is often much more difficult than 
showing that the chain has the right invariant measure (which is quite easy in 
this example). In this example, we clearly need at least N? steps to get close, 
since each of the entries should have a good chance to be changed. 

We will give some other examples of where these kinds of algorithms have 
been used. In all of these cases the algorithms are fairly efficient, although in 
some cases only partial rigorous analysis has been given. 


Example 1. Ising Model. Let S be the set of N x N matrices with entries 
1 or —1. For any M € S we define the “energy” of the matrix by 


H(M)=—- | M(i,j)M(’,j’), 
(5) ~(,9”) 
where (7,7) ~ (7’, 9’) if the entries are “nearest neighbors,” 
Ponce ot 


The value M(i,7) is called the “spin” at site (7,7) and the energy is mini- 
mized when all the spins are the same. The Ising model gives a probability 
distribution on S that weights matrices of low energy the highest. For any 
a > 0 we let 


exp{—aH(M)} 


re) Sxares &xP(—aHl (MEY 
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This is a well-defined probability measure, although it is difficult to calculate 
the normalization factor 


Z(a)= 5° exp{—aH(M’)}. 
M’eES 


If M and M’ are two matrices that agree in all but one entry, we can calculate 
Ta(M)/m4(M’) easily without calculating Z(a). 
Write M ~ M’ if M and M’ differ in exactly one entry. We define P, by 


1 Ta(M’) 
P,(M, M’) = — 1 M ~ M’ 
(ML) = 5 min | 
and 
1 
P,(M,M) =1- WW? ) P,(M,M’). 
M’/~M 


In other words, one runs an algorithm as follows: 1) start with a matrix M; 
2) choose an entry of the matrix at random and let M’ be the matrix which 
agrees with M everywhere except at that entry; 3) move to matrix M’ with 
probability min{1,7,(M’)/7_(M)} and otherwise stay at the matrix M. It is 
easy to check that this is an irreducible Markov chain reversible with respect 
CO: Ta: 


Example 2. The above example is a specific case of a general algorithm. 
Suppose G = (V, EF) is a connected graph such that each vertex is adjacent to 
at most AK other vertices. Suppose a positive function f on V is given, and 
let a be the probability measure 


and 


Then P is an irreducible Markov chain, reversible with respect to 7. Algo- 
rithms of this type are often referred to as Metropolis algorithms. 


Example 3. There is another class of algorithms, called Gibbs samplers, 
which are similar. Suppose we have n variables (71,...,2n) each of which 
can take on one of K values say {a1,... ,a«}. Let S be the set of K” possible 
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n-tuples and assume we have a positive function f on S. We want to sample 
from the distribution 


Diesen 50) 


a | ee (7.2) 
Deel bse yn)ES He Oce vee Un) 


(Dis cves Pe) 


Our algorithm is to choose a 7 € {1,... ,n} at random and then change z; to 
z according to the conditional probability 


acne - > Uj-1,%,Lj41)--- tn) 


K : 
ie F (215 tee »Uj—-1, 4k, L741; ce ey) 


This gives the transition probability 


Pitty Gio is dn) = 


1 TDi A aes rary 
ra J — ad n) ’ as at ore = 0t , 
a Ey PP iscee »Uj7-1, 4k, Lj4+1,--- So) / 


and P((11,... ,@n),(%1,--- ,%n)) equal to whatever is necessary to make the 
rows sum to 1. Again it is straightforward to check that this is an irreducible 
Markov chain, reversible with respect to 7. Note also that to run the chain 
we never need to calculate the denominator in (7.2). 

The Ising model can be considered one example with n = N?, K = 2, and 
the possible values —1,1. In this case we get 


je mere eae 24 0 ee 
P(M,M) = N? exp{—aH(M)} + exp{—aH(M’)}’ 


if M and M’ differ in exactly one entry. 


7.4 <A Criterion for Recurrence 


In this section we develop a useful monotonicity result for random walks 
with symmetric rates. To illustrate the usefulness of the result consider two 
possible rates on Z*. The first is a(x, y) = 1 if |e—y| = 1 and 0 otherwise. This 
corresponds to simple random walk which we have already seen is recurrent in 
two dimensions. For the other rate, suppose we remove some edges from the 
integer lattice as illustrated below. More precisely, suppose we have a subset 
B of the edges of the lattice and state that a(x, y) = 1 only if the edge (z, y) 
is contained in B. 


Reversible Markov Chains 167 


What our result will say is that for any such subset B the correspond- 
ing chain is still recurrent. Assume we have a graph G = (V,E) and two 
symmetric rate functions a and @ on E. 


Fact. If a produces a recurrent chain and B(x, y) < a(x, y) for all (x,y), then 
GB also produces a recurrent chain. 


The proof of this statement takes a little work. We start with some pre- 
liminary remarks. Suppose we write the elements of V as {70, 21, %2,...} (we 
will assume V is infinite, for otherwise the chains are always recurrent). Let 
An = {£0,@n,Tn41,---}. Let us start the chain at xo, wait until it leaves xo 
for the first time, and then see what point in A, is hit first by the chain. Let 
hn(Z9) = hn(x%o;a@) be the probability that the first such point hit is not 2 
(using transition rates a). Then it is not too difficult to convince oneself that 
the chain is recurrent if and only if 


lim h,(xo) = 0. (7.3) 


It is the goal of this section to give a formulation of h,,(xq) that will allow us 
to conclude the monotonicity result. 

For this section we will assume that a graph G = (V, FE) is given as well as 
a symmetric transition rate a: E — [0,00). Let A be a subset of V and fix 
Xo € A. Let X; be a continuous time Markov chain with rates a@ and let 7 be 
the infimum of all t > 0 such that X; € A. Define f(y) to be the probability 
starting at y that the first visit to A occurs at the point 29, 


fuer X= ae | Xo Suh 


It is easy to see that f(xg) = 1 and f(y) = 0 for y € A, y # Xo. Suppose 
y ¢ A. Then the probability that the first new site that y visits is z is 
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a(y,z)/a(y), where again we write a(y) = >}o,-y a(y,z). By concentrating 
on this first move, we see that 


io) = S- P{first new site is z}f(z) 


zEV 


or 


aly) f(y) = >> aly, z)f (2). (7.4) 


A function f satisfying (7.4) is called a-harmonic at y. We have shown that 
our given f is a-harmonic at all y ¢ A, and one can show with a little more 
work that f is the unique function that is a-harmonic at y ¢ A and that 
satisfies the boundary condition f(zp) = 1, f(y) =0, yE A,y Fz. 

We will now characterize f as the function that minimizes a particular 
functional (a functional is a real-valued function of a function). For any 
function g let 


Qa(g) = >) >, az, y) (g(x) — gly)”. 


rEV yEV 


Suppose we consider only those functions g that satisfy the boundary condi- 
tion g(%0) = 1, g(y) = 0, y€ A,y # x. Let g be the function satisfying this 
boundary condition which minimizes Q,. Then at any y ¢ A, perturbations 
of g at y, leaving all other values fixed, should increase Q,. In other words if 
we define g,.(z) by 


Then 


A simple calculation shows that this holds if and only if for every y ¢ A, 


>_ a(2)aly, 2) = S- H(y)aly, z) = G(y)ay). 


zEV zEV 


In other words g is the function that is a-harmonic at each y ¢ A and satisfies 
the boundary conditions. Since f is the only such function, g = f. Summa- 
rizing, f, as defined above, is also the function that minimizes Q,.(g) subject 
to the boundary condition, g(zo) = 1, g(y) =0, yE Ay Fz. 
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We now use “summation by parts” to give another expression for Qa(f). 
We start by writing 


Qa(f) = 4° S° a(z,y) (F(z) - F(y))? 


rEV yEV 


= Vole, Ff) - Fy) 
Yale,» FF) - Few) 
=25S >) a(z,y)f(2)(f(x) — f(y). 


The last equality uses the symmetry of a. Since f(x9) = 1 and f(y) =0, y € 
A,y # x we can write this as 


2S a(zo,y)(1 — f(y) +2 f(a) S- a(z,y)(F(x) — Fly). 


reA 


But, if x ¢ A, then f is a-harmonic at z, 


S~ a(2,y) f(y) = So ala, y)f(2) = a(a) f(2). 


y 


Hence the second term in the sum is 0 and we get 


Qa(f) =2 Fy alao,y)(1 — Fly) = 2a(ao) Sp SH a — pry)). 


yeV yEeV a(Xo) 


Now let h(zo) be the probability that the chain starting at x9 makes its first 
visit to A, after leaving xo for the first time, at some point other than zo. By 
considering the first step, we see that 


hao) = Sy MEH — py) = Salton) 


yEeV a(x9) 


where 


Qa(vo, A) = inf S> )) a(z,y)(9(2) — g(y))?, 


rEV yEV 


and the infimum is taken over all functions g satisfying g(xo) = 0 and g(y) = 
1,y€A,y # x. The beauty of the formula comes in the realization that if 
G(x, y) is another collection of rates with B(x, y) < a(z,y) for all z,y € V, 
then 


Qa(Zo, A) = Qa(Zo, A). 
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If we now write h,(xo;a) and h,,(xo, G) as in the beginning of this section, 
we see that we have shown that if B(xz,y) < a(z,y) for all x, y, 


B(x) 
a(Xo) 


In particular, if we use the criterion given in (7.3), we see that if the chain 
with rates q@ is recurrent, then the chain with rates @ is also recurrent. 


hn (x0; B) < hn (Xo; @). 


7.5 Exercises 
7.1 Show that every irreducible, discrete-time, two-state Markov chain is 


reversible with respect to its invariant probability. 


7.2 Suppose X; is a continuous-time Markov chain with state space S = 
{1,...,N} and symmetric rates a. 
(a) Show that for all ¢ and all z, 


Pi Sea | So et 2 (7.5) 


= 
N 
(Hint: write 
P{S,=«|So9=2}= > {5:2 =y | So=2}?.) 
yes 


(b) Give an example of a non-symmetric chain whose invariant probability 
distribution is uniform such that (7.5) does not hold for some x € S,t > 0. 


7.3 Let X, be an aperiodic, discrete-time Markov chain on S = {1,... ,N} 
whose transition probability is symmetric. Show that for all x € S and all 
integers n, 


1 
Pi Seon =2| So =a} > —. 

{San = | Sp = 2} 2 
Does this hold if 2n is replaced with 2n + 1? 

7.4 Let X; be the continuous-time simple random walk on a circle as in 
Example 2, Section 7.2. Show that there exists a c > 0, independent of N, 
such that for all z,y € {1,...,N} and allt > N?, 

C 


(Hint: (7.5) may be helpful.) 
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7.5 Let X; be an aperiodic Markov chain with state space S = {1,...,N} 
with rates a and invariant probability 7. For every 0 < « < 1, let T. be the 
infimum of all t > 0 such that for every x,y € S, 


P{X;=2|Xo =y} > ex(z). (7.6) 


(a) Explain why T. < oo for every 0 <e€ <1. 
(b) Show that (7.6) holds for all t > Ty. 
(c) Show that if 0 < « < 1 and k is a positive integer, 


‘are < kT. 


(d) Let X; be the continuous-time simple random walk on a circle as in 
Example 2, Section 7.2. Show that there exist a c,@ > 0, independent of N 
such that for all initial probability distributions v and all t > 0, 


= 2 
|e“Av — allay < pe uN 


where 7 denotes the uniform distribution. 


7.6 COMPUTER SIMULATION. Let M be a matrix chosen uniformly from 
the set of 50x 50 matrices with entries 0 and 1 such that no two 1s are together 
(see Section 7.3). Use a Markov chain simulation as described in Section 7.3 
to estimate the probability that the M(25, 25) entry of this matrix is a 1. 


7.7 COMPUTER SIMULATION. Let 5S, be the set of finite sequences of 
numbers (ko, ki,...,kn) where each k; € {0,1} and no two 1s are adjacent, 
ie., kj +kj-1 < 1 for 7 = 1,...,n. Let pp(j) denote the fraction of such 
sequences with k; = 1. Do a Markov chain simulation similar to the previous 
exercise to estimate p299(0), p200(100). 


7.8 In this exercise, we will calculate the values of p,(j) in Exercise 7.7 
exactly. Let r,(ij) denote the number of sequences in S,, with ko = i, kn = 7. 
(a) Explain why 


Tn+1(00) = Tn (00) + rn(O1), Tn+1(01) = Tn(00), 


and give similar equations for r741(10), 7n41(11). 

(b) Use these equations to find r,,(00),7n(01),r,(10),r,(11). (Hint: see 
Exercise 0.3.) 

(c) Find pp(J). 


7.9 Find the eigenvalues of the N x N matrix A from Example 2, Section 
2; 


=a, = ds 
A(i,j) = ¢ 1/2, |i— 3] = 1(modN), 
0, otherwise. 
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[Hint: any eigenvector with eigenvalue 4 can be considered as a function f(n) 
on the integers satisfying 


Al) = 5 f(n+1) + 5 f(n = 1) = S(n), 


f(n) = f(n+N), 


for each n. Find the general solution of the difference equation and then use 
the periodicity condition to put restrictions on the 4.] 


7.10 Let a(z,y) be a symmetric rate function on the edges of the integer 
lattice Z4, i.e., a nonnegative function defined for all z, y € Z? with |x—y| = 1 
that satisfies a(x, y) = a(y,x). Suppose there exist numbers 0 < c, < cz < co 
such that for all x,y with |a — y| = 1, 


Cy <a(z,y) < ce. 


Let X; be a continuous-time Markov chain with rates a(z, y). 
(a) If d= 1,2, show that the chain is recurrent. 
(b) If d > 3, show that the chain is transient. 


Chapter 8 


Brownian Motion 


8.1 Introduction 


Brownian motion is a stochastic process that models random continuous 
motion. In order to model “random continuous motion,” we start by writing 
down the physical assumptions that we will make. Let X; represent the posi- 
tion of a particle at time ¢. In this case t takes on values in the nonnegative 
real numbers and X; takes on values in the real line (or perhaps the plane or 
space). This will be an example of a stochastic process with both continuous 
time and continuous state space. 

For ease we will start with the assumption Xo = 0. The next assumption 
is that the motion is “completely random.” Consider two times s < t. We 
do not wish to say that the positions X, and X; are independent, but rather 
that the motion after time s, X; — Xs, is independent of X,. We will need 
this assumption for any finite number of times: for any 5; < t] < sg <te < 

- <8, < tn, the random variables X;, — Xs5,,X1t, — Xs.,---,Xt, — Xs, 
are independent. Also the distribution of the random movements should not 
change with time. Hence we will assume that the distribution of X; — X; 
depends only on ¢t — s. For the time being, we will also assume that there is 
no “drift” to the process, i.e., E (X;) = 0. 

The above assumptions are not sufficient to describe the model we want. 
In fact, if Y; is the Poisson process and X; = Y; —t [so that E (X;) = 0], X; 
satisfies these assumptions but is clearly not a model for continuous motion. 
We will include as our final assumption for our model this continuity: the 
function X; is a continuous function of ¢. 

It turns out that the above assumptions uniquely describe the process at 
least up to a scaling constant. Suppose the process X; satisfies these assump- 
tions. What is the distribution of the random variable X;? For ease, we will 
discuss the case t = 1. For any n, we can then write 


XxX, = [Xi jn =A Xo] 5 [X2/n = Maja 5 eit Xan = Maan: 


In other words, X; can be written as the sum of n independent, identically 
distributed random variables. Moreover, if n is large, each of the random 
variables is small. To be more precise, if we let 


Mn = max{|X1/n _ Xol, |Xo/n = Aral seh IXn/n = Keayalts 
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then as n — oo, M, — 0. This is a consequence of the assumption that X; 
is a continuous function of t (if M, did not go to 0 then there would be a 
“jump” in the path of X;). It is a theorem of probability theory that the 
only distribution that can be written as the sum of n independent, identically 
distributed random variables such that the maximum of the variables goes to 
0 is a normal distribution. We can thus conclude that the distribution of X; 
is a normal distribution. We now formalize this definition. 


Definition. A Brownian motion or a Wiener process with variance param- 
eter o” is a stochastic process X; taking values in the real numbers satisfying 


(i) Xo = 0; 

(ii) For any s; < ty < sg < te < +--+ < Sy, < ty, the random variables 
Xt, —Xs,5---,Xt, — Xs, are independent; 

(iii) For any s < t, the random variable X; — X, has a normal distribution 
with mean 0 and variance (t — s)o7; 

(iv) The paths are continuous, i.e., the function t +> X; is a continuous 
function of t. 


While it is standard to include the fact that the increments are normally 
distributed in the definition, it is worth remembering that this fact can actu- 
ally be deduced from the physical assumptions. Standard Brownian motion 
is a Brownian motion with 0? = 1. We can also speak of a Brownian motion 
starting at x; this is a process satisfying conditions (ii) through (iv) and the 
initial condition Xo = x. If X; is a Brownian motion (starting at 0), then 
Y, = X; +2 is a Brownian motion starting at z. , 

Brownian motion can be constructed as a limit of random walks. Suppose 
Sy, is an unbiased random walk on the integers. We can write 


Sn =Yit--- + Yn, 


where the random variables Y; are independent, 
] 
Py = 1] Ply = =1 = 5° 


Now instead of having time increments of size 1 we will have increments of 
size At = 1/N where N is an integer. We will set 


Wrat = an Sk, 


where we choose a normalizing constant ay so that W, has variance 1. Since 
Var(Sn) = N, it is clear that we must choose ay = N~1!/?. Hence in this 
discrete approximation, the size of the jump in time At = 1/N is 1//N = 
(At)!/?. We can consider the discrete approximation as a process for all values 
of t by linear interpolation (see the figure below). 
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2At 3At 4At 5At 6At 


As N — oo, this discrete approximation approaches a continuous-time, 
continuous-space process. By the central limit theorem the distribution of 


SN 
4 /N 
approaches a normal distribution with mean O and variance 1. Similarly, 
the distribution of wl? approaches a normal distribution with mean 0 and 
variance t. The limiting process can be shown to be a standard Brownian 
motion. (It requires some sophisticated mathematics to state explicitly what 
kind of limit is being taken here. We will not worry about this detail.) 

The path of a Brownian motion is very rough. Consider the increment 


Xi+at — Xt for small At. The distribution of this increment has mean 0, 
variance At so 


wi? = 


EK (\Xrpat = X;|*) = At. 


In other words the typical size of an increment, |X44.a;—X;|, is about VAt. As 
At — 0, VAt — 0, which is consistent with the continuity of the paths. What 
about differentiability? Does it make sense to talk about dX;/dt? Recall the 
definition of the derivative from calculus, 

aX, _ i Xtzat — Xt 

— = lim ————. 

dt At—0 At 

When At is small, the absolute value of the numerator is on the order of 
V At which is much larger than At. Hence, this limit does not exist. By a 
sharpening of this argument one can prove the following. 


Fact. The path of a Brownian motion X; 1s nowhere differentiable. 


Care is needed in proving statements such as the one above. The intuitive 
argument can be used fairly easily to prove the statement “for each t, the 
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probability that X; is not differentiable at t is 1.” This is not as strong as 
the fact above which states “the probability that X; is not differentiable at 
all values of t is 1.” This distinction is a little tricky to understand. As a 
possibly easier example consider the following two statements: “For each f, 
the probability that X,; # 1 is 1” and “The probability that X; 4 1 for all 
values of t is 1.” These statements are not the same, and, in fact, the first is 
true and the second is false. For any given t, X; has a normal distribution; 
hence the probability of taking on any particular value is 0 (this is true for any 
continuous distribution). However, the probability that X, > 1 is certainly 
greater than 0. If Xp = 0 and X, > 1, then the continuity of X; implies that 
X; = 1 for some 0 < t < 1. Hence the probability that X; = 1 for some 
0O<t< 1 is greater than 0. The difficulty here comes with the fact that the 
real numbers are uncountable. We can write 


{X, = 1 forsomeO0O<t<1}= LJ LX pel). 


0<t<1 


The right-hand side is a union of sets each with probability 0. However, it is 
an uncountable union of such sets. The axioms of probability imply that the 
countable union of sets of probability 0 has probability 0 but does not say 
the same for an uncountable union. This phenomenon arises whenever one 
deals in continuous probability. For example, if Y is any continuous random 
variable then 


{-c << Y<o}= tt ‘Y =a 


—oo<y<co 


The right-hand side is a union of events with probability 0, but the left-hand 
side has probability 1. 

In stochastic processes with continuous time and space, many difficult tech- 
nical problems can arise in trying to deal with uncountable unions of sets. We 
will ignore most of these issues here. Most of these problems are relatively 
easily overcome for Brownian motion. 


8.2 Markov Property 


Let X; be a standard Brownian motion. We will let 7; represent the in- 
formation contained in X,,s < t, in other words all the information that can 
be obtained from watching the Brownian motion up through time t. Suppose 
s <t and consider the conditional expectation E(X; | F,). Note that 


E(X; | Fs) = E(X5 | Fs) + F(X: — Xz | Fz). 


Since X, is F, measurable, the first term on the right-hand side equals X.. 
Since X; — X, is independent of F,, the second term equals E (X; — X,) = 0. 
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Hence 
E( Xe | Fs) = Xe =] ECA Xz): 


The equality of the left-hand and right-hand sides above illustrates the Markov 
property of Brownian motion, i.e., in order to predict X; given all the informa- 
tion up through time s, it suffices to consider only the value of the Brownian 
motion at time s. More generally, the Markov property implies that for func- 
tions f, 


El f(Xt) | Fs] = Elf (Xe) | Xs]. 


Brownian motion satisfies this property. This follows from an even stronger 
property of Brownian motion: if Y; = X54; — X,, then Y; is a Brownian 
motion independent of #,. In other words Z; = X,1; is a Brownian motion 
starting at the (random) starting point X.. 

Let p;(x, y) denote the transition densities, i.e., the density of X; for Brow- 
nian motion starting at x. Since X; — Xo is normal, mean 0, variance ft, 


ey-2)/2t  _ on < y <0. 


Ly) = 
Pt ( y) om 


The transition densities satisfy the Chapman—Kolmogorov equation 


Ds+t(Z,y) = / Ds(x, Zz) pe(z, y) dz. 


This can be verified directly for this transition function, but one can also see 
this by appealing to the Markov property. Since Z; = X,4;4 1s a Brownian mo- 
tion starting at X,, the Chapman—Kolmogorov equation averages the density 
pr(z,y) over all possible starting points z. 

In order to do many useful computations about Brownian motions, a more 
general Markov property is needed. This is generally referred to as the strong 
Markov property. We first need the notion of a real-valued stopping time. 
The definition is a generalization of the definition of a stopping time given 
for discrete-time processes. We say that a random variable T taking values 
in [0, co] is a stopping time for Brownian motion if for each t the (indicator 
function of the) event {7 < t} is measurable with respect to F;. In other 
words, to know whether or not the process has stopped before time t, one 
only needs to look at the Brownian motion up through time t. The most 
important examples will be stopping times of the form 


Le = inf {t : X} nae x}. 


If T is a stopping time, we write Fr for the information contained in the 
Brownian motion up through the stopping time T' (one gets to view the path 
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up through time JT but not beyond). We will let Y; denote the process beyond 
time T,, 


Y, = Xi47 — Xr. 


Strong Markov Property. Y; is a Brownian motion independent of Fr. 


It is easier to see what this means by considering an example of how the 
property is used. Suppose the Brownian motion starts at 0 and we want to 
calculate the probability that there exists some t with O < t <1 and X; > 1. 
Let T = T, be the first time that the Brownian motion equals 1. Then, by 
continuity, the event {X; > 1 for some 0 < t < 1} is the same as the event 
{T <1}. Since 


Presi Pix = 1S 0, 
we can see that 
P< AS PAT 1h 


Now consider the event {X; > 1}. Since X, is normal, mean 0, variance 1, 


Pix 2i= fo 


—x* /2 d 
ce Yo 
V2 


Also, 
PLEX; > 1) =P{T <1} Pi > 1/7 <1} 


Now we use the strong Markov property. Suppose 7’ < 1. We may assume 
in fact that T < 1 (since T = 1 has probability 0 of occurring). Then, 
given 7’, X,; — Xr = X, — 1 1s a normal random variable, mean 0, variance 
1—T. Regardless of the variance, we know by the symmetry of the normal 
distribution that the probability that this normal random variable is greater 
than or equal to 0 is 1/2. Hence, we conclude 


P{X,-1>0|T<1}=1/2. 


Therefore 


et /2 dx. 


PIT <1} =2P{X, >1}=2 | 


ai 
This result is a particular case of the reflection principle. We now state the 


general result which is proved in the same way. 


Reflection Principle. Suppose X; 1s a Brownian motion with variance pa- 
rameter o* starting ata anda <b. Then for anyt > 0, 


P{X, > b for someQ<s<t}=2P{X; >b| Xo =a} 


—(x—a)?/207t dr. 


a 1 
- 2 | ezeee 
b V2rto? 
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Example 1. Let t > 1 and let us compute the probability that a standard 
Brownian motion crosses the x-axis sometime between times 1 and f, i.e., 


P{X, =0 for some 1 <s< t}. 


We first condition on what happens at time t = 1. Suppose X; = b > 0. Then 
the probability that X, = 0 for some 1 < s < t is the same as the probability 
that X, < —b for some 0 < s <t—1. This is the same (by symmetry) as the 
probability that X, > b for some 0 < s <t—1. This probability is given by 
the reflection principle, so 


P{X, =0 forsomel1<s<t|Xy =o} =2 / ~2*/2(t-1) gy, 
b 


] 
/ 2n(t — 1) : 


By symmetry, again, the probability is the same if X; = —b. Hence, by 
averaging over all possible values of b we get 


P{X, =0 for some 1 < s < t} 


=f pi(0, 6) P{X,; = 0 for some 1 < s <t|X, = b} db 


oF 2 = 1 : 
=2f eo 2 | eae OY) Geli ab. 
0 Von » /2n(t — 1) 


The substitution y = x/,/t — 1 in the inside integral reduces this integral to 


CO CO 1 
| / — e (OF¥)/2 dy db, 
) b(t—1)—1/2 20 


This integral can be computed using polar coordinates. Note that the region 
{0 <b <ow,b(t — 1)~1/? < y < o0} corresponds to the polar region {0 < r < 
oo, arctan(/t — 1)~! < 6 < 1/2}. Hence the probability equals 


(ore) nm /2 1 ‘ 
| / —e" /* dO dr 
0 arctan((t—1)~') 27 


= = alClah SS Of | Me i rT 
2 Vt—-1/ 2m Jo 


2 
= 1 — — arctan 
T 


1 
vt-1 


Example 2. We will show that (with probability one) 
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First, we consider the limit taken over only integer times. Note that for n an 
integer, 


Xn, cars (Xy —~ Xo) +--+: + (Xn -— Xn-1), 


is a sum of independent, identically distributed random variables. It follows 
from the (strong) law of large numbers that 


For each n, let 
M, = sup{|X; — Xyn|:n<t<n+1}. 
If we can show that 


. n 
lim — =0 
n-oco nN 


y] 


we will be finished since for any t, ifn <t<n+l, 


Xe]. LXel 2 Xnl+1Mnl 
it~ nm ~*~ n 


For any a > 0, symmetry and the reflection principle state that 
a n-@a et /2 dy 
n| oa pai oy 


oa | 
<< | e 4/2 dy 
a V 


2 


8 
cee a Gnie. 
aV/27r 


If we plug in a = 2(Inn)!/?, we get 


8 
Pi|M,| > 2Vlonn} < —————.. 
Mn] 2 HS 2 V2r Inn n? 


In particular, for all n sufficiently large, the probability is less than n~?. If 
we let J, denote the indicator function of the event {|M,| > 2V1Inn} and 


we find that E (J) < co. This states that the expected number of times that 
|M,| > 2VInn is finite and hence that, with probability one, |M,| > 2V/Inn 
only finitely often. In particular, this implies that n~!M,, — 0. 
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8.3 Zero Set of Brownian Motion 
In this section we will investigate the (random) set 


It turns out that this set is an interesting “fractal” subset of the real line. 
In analyzing this set we will use two important scaling results about Brow- 
nian motion which will be proved in the exercises (see Exercises 8.7 and 8.8). 


Scaling Properties. Suppose X; is a standard Brownian motion. Then, 
(1) Ifa>0, and Y, =a7'/?XQ,, then Y; is a standard Brownian motion. 


(2) If X; 1s a standard Brownian motion and Y; = tX1;,, then Y; is a 
standard Brownian motion. 


In an example in the previous section, we proved that 


2 1 
PIZ ne eO} S1 — arctan ea 
As t — oo the quantity on the right-hand side tends to 1. This tells us that 
with probability 1 the Brownian motion eventually returns to the origin, and 
hence (with the help of the strong Markov property) that it returns infinitely 
often. This means that the Brownian motion for large t has both positive and 
negative values. 

What happens near t = 0? Let ¥; = tX,;,. Then Y; is also a standard 
Brownian motion. As time goes to infinity in the process X, time goes to 0 in 
Y. Hence, since X; has both positive and negative values for arbitrarily large 
values of t, Y; has positive and negative values for arbitrarily small values of 
t. This states that in any interval about 0 the Brownian motion takes on both 
positive and negative values (and hence by continuity also the value 0)! 

One topological property that Z satisfies is the fact that Z is a closed set. 
This means that if a sequence of points t; © Z and t; — t, then t € Z. This 
follows from the continuity of the function X;. For any continuous function, 
if t; — t, then X;, — X;. We have seen that 0 is not an isolated point of Z, 
i.e., there are positive numbers t; € Z such that t; — 0. It can be shown that 
none of the points of Z are isolated points. From a topological perspective Z 
looks like the Cantor set (see the example below for a definition). 

How “big” is the set Z? To discuss this we need to discuss the notion of 
a dimension of a set. There are two similar notions of dimension, Hausdorff 
dimension and box dimension, which can give fractional dimensions to sets. 
(There is a phrase “fractal dimension” which is used a lot in scientific litera- 
ture. As arule, the people who use this phrase are not distinguishing between 


182 Introduction to Stochastic Processes 


Hausdorff and box dimension and could mean either one.) The notion of di- 
mension we will discuss here will be that of box dimension, but all the sets we 
will discuss have Hausdorff dimension equal to their box dimension. Suppose 
we have a bounded set A in d-dimensional space R?. Suppose we cover A 
with d-dimensional balls of diameter «. How many such balls are needed? If 
A is a line segment of length 1 (one-dimensional set), then «~' such balls are 
needed. If A is a two-dimensional square, however, on the order of €~? such 
balls are needed. One can see that for a standard k-dimensional set, we need 
«~* such balls. This leads us to define the (box) dimension of the set A to be 
the number D such that for small € the number of balls of diameter « needed 
to cover A is on the order of e~?. 


Example. Consider the fractal subset of [0,1], the Cantor set. The Cantor 
set A can be defined as a limit of approximate Cantor sets A,. We start with 
Ao = [0,1]. The next set A; is obtained by removing the open middle interval 


(1/3, 2/3), so that 
as=[od}u 2a). 


The second set Ag is obtained by removing the middle thirds of the two 
intervals in A,, hence 


1 2 1 2 f 8 
neo rg me eg hemes ve fae 
2 osfu i 5] Fale [a 


In general A,,,, is obtained from A, by removing the “middle third” of each 
interval. ‘The Cantor set A is then the limit of these sets A,, 


Ay 


n=1 


Note that A, consists of 2” intervals each of length 3~”. Suppose we try 
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to cover A by intervals of length 3~”, 
k-1 k 
Be Brae 


We need 2” such intervals. Hence the dimension D of the Cantor set is the 
number such that 2” = (3-")~, ive., 


Now consider the set Z and consider Z; = Z1™ [0,1]. We will try to cover 
Z, by one-dimensional balls (i.e., intervals) of diameter (length) « = 1/n. For 
ease we will consider the n intervals 

k-1 k 
| Al, RS 1 Deaths 


n n 


How many of these intervals are needed to cover Z,? Such an interval is 


needed if Z,; N[(k —1)/n,k/n| 4 0. What is 
Pika PAZ 0 A =| 4 ohn 
non 


Assume k > 1 (if k = 0, the probability is 1 since 0 € Z). By the scaling 
property of Brownian motion, Y; = ((k — La) OX es es is a standard 
Brownian motion. Hence 


k 
P(k,n) =PAY = 0 forsomel<t< oi}; 
This probability was calculated in the previous section, 
2 
P(k,n) = 1—-— arctan Vk —- 1. 
1 


Therefore, the expected number of the intervals needed to cover Z; looks like 


7m 


> P(kyn) = De [ — = arctan VE 1 . 


k=1 k=1 


To estimate the sum, we need to consider the Taylor series for arctan(1/t) at 
t = 0 (which requires remembering the derivative of arctan), 


1 
arctan = =~ —t+ O(t?). 
i 2 
In other words, for x large, 


arctan 27 


Sle 


1 
2 


184 Introduction to Stochastic Processes 


P(k,n) = 1+ =f (x — 1) YP dems fa 
SP 14S 


Hence it takes on the order of ,/n intervals of length 1/n to cover Zj, or, in 
other words, 


Fact. The fractal dimension of the zero set Z is 1/2. 


8.4 Brownian Motion in Several Dimensions 


Suppose X},... , X@ are independent (one-dimensional) standard Brownian 
motions. We will call the vector-valued stochastic process 


1 0.6 ee. Cu. 


a standard d-dimensional Brownian motion. In other words, a d-dimensional 
Brownian motion is a process in which each component performs a Brownian 
motion, and the component Brownian motions are independent. 

It is not difficult to show that X; defined as above satisfies the following: 


(i) Xo =0; 

(ii) for any s) <t, < 59 < to < +--+ < 8, <ty, the (vector-valued) random 
variables X;, — Xs5,,...,Xz,, — Xt,,_, are independent; 

(iii) the random variable X;—X, has a joint normal distribution with mean 
0 and covariance matrix (t — s)I, i.e., has density f(xz1,... ,2%q) equal to 


( L eater)... I able) ert 
V2mr V2rr (Qn7 tt 


where r = t — s; 

(iv) X; is a continuous function of t. 

We could use (i) through (iv) as the definition of X;, but we would quickly 
discover that we could construct X; by taking d independent one-dimensional 
Brownian motions. As in the one-dimensional case we let p;:(z,y),z,y € R?¢ 


denote the probability density of X; assuming Xo = 2Z (it is clear how to 
define a Brownian motion starting at any point in R2), 


1 2 
= —|y—2|* /2t 


Again, this satisfies the Chapman—Kolmogorov equation 


Ps+t(Z,y) = [ps z) pe(z,y) dz +--+ dzq. 
R 
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Brownian motion is closely related to the theory of diffusion. Suppose that 
a large number of particles are distributed in R¢@ according to a density f(y). 
Let f(t,y) denote the density of the particles at time ¢ (so that f(0,y) = 
f(y)). If we assume that the particles perform standard Brownian motions, 
independently, then we can write the density of particles at time t. If a particle 
starts at position x, then the probability density for its position at time t is 
pi(x,y). By integrating, we get 


tha) I f(x) pe(ax, y) dx, ---dxgq. 


The symmetry of Brownian motion tells us that p;(x, y) = p:(y, x). Hence we 
can write the right-hand side as 


I f(x) pe(y, x) dxy---dxgq. 


The right-hand side represents the expected value of f(X;) assuming Xo = y. 
We can then write this, 


f(t,y) = E*[f(Xe)I- 


The notation EY is used to denote expectations of X; assuming Xo = y. 

We will now derive a differential equation that f(t,x) satisfies. Consider 
Of /Ot; for ease we will take t = 0,d = 1. If f is sufficiently nice, we can write 
the Taylor series for f about z, 


Flu) = Fe) + F(@)y—2) + 5 F"(a)y- 2)? + o((y— 2), 


where o(-) denotes an error term such that o((y—2x)?)/(y—2)? ~ Oasy — z. 
Therefore, 


af| —.. 1_, 
- i. ae Lf (Xt) — F(Xo)| 


= lim = [f'(x) E7[X; — a] 


t—0 ¢ 


+5 I" (0) E*[(Xe— 2)"] + o((X - 2)*)]. 


We know that E*[X; — z] = 0 and E*|(X; — x)*] = Var(X;) = t. Also since 
(X, — x)? is of order t, the term t~!o(-) tends to 0. Hence we get 
of 1 


At |, gf (2). 


The same argument holds for all ¢ giving 


Of 10°F 


Ot 2 Ax?’ 
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Similarly, we can extend this argument to d dimensions and show that f 
satisfies the equation 


where A denotes the Laplacian, 


d O° f 
FANG iG ig eee na b = Ds oaP 


This equation is often called the heat equation. One can find a similar solution 
to the heat equation with diffusion constant D, 


Of D 


A 
at = 94h 


by considering Brownian motions with variance parameter a” = D. 
Sometimes it is useful to consider the heat equation in a bounded domain. 
Let B be a bounded region of R? with boundary OB. 


Imagine an initial heat distribution on B, f(x),x2 € B is given. Suppose 
also that the temperature is fixed at the boundary, i.e., there is a function 
g(y), y € OB representing the fixed temperature at point y. If u(t, x) denotes 
the temperature at x at time t, then u(t, x) satisfies 


Gi) (0,2) fla); 2 eB. 


The solution of (i) through (iii) can be written in terms of Brownian motion. 
Let X; be a d-dimensional Brownian motion with variance parameter o* = D. 
Let T = Tap be the first time that the Brownian motion hits the boundary 
OB, 


r = inf{t: X, € OB}. 
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Then the solution can be written as 
u(t,x) =E* [f(Xi)l{7 > t} + g(X,) I {7 < thl. 


In other words, at time t, take the average value of the following: f(X;) for 
the paths that have not hit OB and g(X-,) for those paths that have hit OB. 
As t — ov, the temperature approaches a steady-state distribution u(x) with 
boundary value g(x). The steady-state solution satisfies 


(i) Aula) =0; @2e 8, 


(ii) (ey 9(2), 2e OB. 
The solution is given by 


v(e) = lim u(t.2) = B*|9g(.X-)|. 


t—co 


Example 1. Let d = 1 and suppose that B = (a,b) with 0 < a < b < ~w. 
Then 0B = {a,b}. Take a < x < b and consider 


7 =inf{t: X; =a or dD}, 


where X; is a standard Brownian motion. Let g be the function on OB, 
g(a) = 0, 9(b) = 1. Then 


u(x) = E*|g(X,)| = P*{X, = b} 


(here we have used P” to denote a probability assuming Xo = x). We know 
by above that u(x) satisfies 


We can solve this differential equation easily and we get 


L—a 


ue) = cea 


This is the Brownian motion analogue of the gambler’s ruin estimate. 


Example 2. Let d = 1 and suppose that B = (0,7) and that Xo = y € (0,71). 
Let u(t, 2) be the solution of the heat equation 


du _ 1 du 
Ot 20x 
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with boundary conditions u(t,0) = u(t,7) = O and such that as ¢ goes to 
0, u(t, x) approaches the “delta function” at y. Then u(t,x),0 < x < 7 also 
denotes the density of the Brownian motion restricted to those paths that have 
not left (0,7). The function u can be found explicitly using the technique of 
separation of variables. First, it is easy to check that for all integers n, the 


function e~**/2 sin(nx) satisfies the heat equation and equals zero on the 
boundary. Therefore, for any choice of constants C,,, the function 


ut. a) = S- Ce sin(nz), 


n=1 


satisfies the heat equation and the boundary condition. If we want u(0,z) = 
f(x), then we need to choose the constants so that 


f(z) = 3 Cy sin(nz). 
n=l 
Since 
[ sin(nz) sin(mz)dr=0 if nm, 
we can see that C, must satisfy 
[ f(x) sin(nz) dx = Cy [ sin’(nz) dx = = On. 


In the case where f is the delta function at y, we choose 


Cn = 7 . f(z) sin(nz) dz = a sin(ny). 
tT Jo 1 


Hence, 
2— 2 
u(t, x) = — »: e”'/2 sin(ny) sin(nz). 
1 
n=1 
Ast > oo, 
Bate Bes 
u(t,r) ~ —e siny sing. 
7 


Example 3. If d > 1, D=1, g = 0, then one can try to write the solution of 
the heat equation in the form 


tn) = S- Cre rn'l? 6, (x), 


i= 1 
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where the functions ¢, are eigenfunctions of A with eigenvalue —A, and 
Dirichlet boundary conditions, 1.e., 


LOAD) HS, On), 2 ER Of) =—0. 2608. 


In order to do this, we need to find a collection of such eigenfunctions that 
are orthogonal, 


: @alL) Ome) dry ++ dzg=0, nm, 
B 


and are complete, i.e., each f can be written as 
CO 
{(2) = ~, CxO aD): 
nal 


For a number of regions, such as balls in R?, the eigenfuctions and eigenvalues 
are known. For a much wider class of regions, one can prove the existence of 
such a collection of functions. See a book on partial differential equations for 
more information. If B is a bounded, connected region, the eigenfunction ¢, 
associated to the largest eigenvalue —A; (the eigenvalue of smallest absolute 
value) can be chosen so that if Xo = y, the density u(t, x) satisfies 


u(t, x) ~ e™*/? bi (y) oi(z), > 00. 


In the previous example, 4; = 1 and ¢)(x) = 2/7 sing. 


8.5 Recurrence and Transience 


In this section we ask whether the Brownian motion keeps returning to the 
origin. We have already answered this question for one-dimensional Brownian 
motion; if X; is a standard (one-dimensional) Brownian motion, then X; is 
recurrent, i.e., there are arbitrarily large times t with X; = 0. 

Now suppose X; is a standard d-dimensional Brownian motion. Let 0 < 


R, < R2 < o and let B = B(R;, Ro) be the annulus 
B={xzeR*: R; < |z| < Ro}, 


with boundary 


OB = {x €R®: |x| = R; or |x| = Ro}. 
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Suppose xz € B. Let f(x) = f(x, R,, R2) be the probability that a standard 
Brownian motion starting at x hits the sphere {y : |y| = R2} before it hits 
the sphere {y: |y| = R,}. If we let 


T = Tap = inf{t : X; € OB}, 


then we can write 


where g(y) = 1 for |y| = Re and g(y) = 0 for y = R,. We saw in the last 
section that f is the function satisfying 


(i) Af(xz)=0, reB, 


(ii) f(y) =90, ly] = Ris f(y) =1, ly| = Re. 


To find f, we first note that the symmetry of Brownian motion implies f(x) = 
@(|xz|) for some @, i.e., the value of f depends only on the absolute value of 
x. We can write the equation (i) in spherical coordinates. The form of the 
Laplacian A in spherical coordinates is somewhat messy; however, it is not so 
bad for functions ¢(r) that depend only on the radius. One can check that 


d*¢ d—l1dd¢ 
A ae ba 
o(r) dr? r dr 
The general solution to the equation 
d—1 


EE Or) a0 
is given by 


s5= qlnr+co, d= 2, 
~ Weare “Sa @o.cd 3: 


The second-order equation for ¢(r) is a first-order equation for w(r) = ¢’(r) 
which can be solved by separation of variables.| Putting in the boundary 
conditions ¢(R,) = 0 and ¢(R2) = 1, we see that 


— Injz|-InR, 


f(z) = o(Ja|) = ine lak 


d = 2, 
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RO Sen ~ 


f(x) = ¢(\2\) 


Consider now the two-dimensional case. Let x € R* and suppose that a 
Brownian motion starts at x (or that the Brownian motion is at x at some 
time t). Take any € > 0, and ask the question: What is the probability that 
the Brownian motion never returns to the disc of radius € about 0? The 
argument above gives us the probability of reaching the circle of radius Ro 
before reaching the disc. The probability we are interested in is therefore 

In |a| — In 

2m P* {|X;| = Re before |X;| = «} = am TT = 0 
Hence, with probability one the Brownian motion always returns to the disc 
of radius € and hence it returns infinitely often and at arbitrarily large times. 
Does it ever return to the point 0, i.e., are there times t with X; = 0? Again, 
start the walk at x # 0. If there is a positive probability of reaching 0, then 
there must be an Ro such that the probability of reaching 0 before reaching 
the circle of radius Ro is positive. But this latter probability can be written 
as 


l —] 
lim PY{| Xe! = € before |X;| = Ro} = lim 1 auld ne = 


~ In Ry — Ine 


Hence the Brownian motion never actually returns to 0. To summarize, the 
Brownian motion in two dimensions returns arbitrarily close to 0 infinitely 
often, but never actually returns to 0. We say that the Brownian motion in 
two dimensions is nezghborhood recurrent but not point recurrent. 

Now consider d > 3. Again we take € > 0 and ask what is the probability 
that the Brownian motion starting at x never returns to the ball of radius e. 
If |x| > €, this is given by 


e2-d =. Es es : d—2 
NM: Sa a ae Sal's 
R200 ¢2-4d _ Rs \x| 
Since the probability is less than 1, we can see that eventually the Brownian 


motion escapes from any ball around the origin and hence goes off to infinity. 
We say that in this case the Brownian motion is transient. 


8.6 Fractal Nature of Brownian Motion 


Let X; be a standard d-dimensional Brownian motion and let A represent 
the (random) set of points visited by the path, 


A= {x €R*: X; == for some t}. 
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In this section we will consider the dimension of the set A for d > 2. 

In order to consider a bounded set, let Ay = AN {zx: |z| < 1}. Fix an € and 
let us try to cover A, with balls of diameter ¢. First consider the whole ball 
of radius 1, {x : |z| < 1} and cover it by balls of diameter ¢. The number of 
such balls needed is of the order of ¢~? (which is consistent with the fact that 
the ball is a dimension d set). How many of these balls are needed to cover 
A,? 

First, consider d = 2. By the argument given in the previous section, every 
open ball is visited by the Brownian motion. Hence A intersects every ball 
and all the balls are needed. Hence the dimension of A is two. 

Now consider d > 2. Take a typical ball of diameter ¢«. What is the prob- 
ability that it is needed in the covering, i.e., what is the probability that 
Brownian motion visits the ball? By the calculations done in the previous 
section, a ball of radius €/2 around a point x (with |z| > €/2) is visited with 
probability (€/2|z|)¢~2. Hence, if € is small and |z| is of order 1, the prob- 
ability is about a constant times e?~?. Since each of the about e~@ balls is 
chosen for the covering with probability about €¢~?, the total number of balls 
needed is about €?~2e~¢ = e~?. Hence the dimension of the set A is two. We 
have just sketched the idea behind this following fact: 


Fact. The path of a d-dimensional Brownian motion (d > 2) has fractal 
dimension two. 


8.7 Scaling Rules 


The fractal nature of Brownian motion is closely related to the scaling 
rule: if X; is a standard one-dimensional Brownian motion and b > 0, then 
Y, = b~!/2X;, is also a standard Brownian motion. A process satisfying the 
properties discussed on page 173 must satisfy this scaling rule. Suppose that 
we were willing to give up the condition that X; is a continuous function of f. 
Could we get different scaling laws? Is there a process that is symmetric about 
zero satisfying the other conditions that has a different scaling exponent A by 
which we mean that Y; = b~*.X;; has the same distribution as X;? 

Let us suppose that such a process exist with scaling exponent A. If we 
assume that X; has a finite variance then A must equal 1/2. This follows 
from the simple calculation 


Var(X1) = Var|X1 jn = (Xo/n eas Miia) a ae (Aieln = XG h/fn)| 
= nVar(X1 jn) = n Var(n~* X1) = n'~* Var(X1), 


which implies that A = 1/2. 
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Let 
M, = max {|Xi/nl, |Xa/n _ Xi nl; ae 9 IXnjn = Naval) ; 


If the paths have jumps, then we expect P{M, > ¢€} not to go to zero as 
n — co for some value of €«. However, assuming the paths are not too wild, we 
would expect that P{M,, > K} would be less than, say 1/2, for some value of 
k. Note that 


P{M, < r} = Py ae a MG =a) fa <—f for 7 = Ls a4 i} 
= P{|X1 yn S a 
= P{n- |X1| <r}" = PL Xi| < rn*}*. 


If we recall that (1 — 4)" — e~%, we can see that a good candidate for the 


distribution of X; would be one satisfying P{|X,| > n+} ~ cn“, or 


P{|Xi| > yp wey. 


If \ < 1/2, then it is not difficult to check that such an X, would have a finite 
variance. But this implies that A = 1/2. Hence there are no examples with 
A < 1/2. For \ > 1/2, there are examples and these are called the symmetric 
stable distribution and the corresponding processes are called symmetric stable 
processes. The density of these processes cannot be given explicitly except in 
the case A = 1 which is the Cauchy distribution with density 


1 


Me) = aay’ 


—COO< F< OM. 


8.8 Brownian Motion with Drift 


Consider a d-dimensional Brownian motion X; with variance parameter o 
starting at « € R®. Let uw € R% and 


2 


Y, = X; + ty. 


Then Y; is called d-dimensional Brownian motion with drift 2 and variance 
parameter o7 starting at x. One can check easily that Y; satisfies 


(i) Yo = wD, 

(ii)if Sp Sty S805. to + S bq Sty, then Yy,.— Yei4002 5. ¥4,.— Ye, are 
independent; 

(iii) Y; — Y, has a normal distribution with mean p(t — s) and covariance 
matrix o7(t — s)I; 

(iv) Y; is a continuous function of t. 
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The motion Y; consists of a “straight line” motion in the direction j with 
random fluctuations. Note that E(Y;) = ty. 
The density of Y; given Yo = 2, p(x, y) is easily seen to be 


1 


lap 2 o2 
Pe(t.¥) = Taya © ly—x—tp|*/2t 


This satisfies the Chapman-—Kolmogorov equation, 
Ps+t(X,y) = I. Ds(x, Z) pe(z,y) dai +++ dzg. 
R 


Suppose we start with a density on R%, f(x). Consider the function 


f(t,x) = E*[f(%)). 


For ease we will consider the case d = 1,t = 0. We again write f in a Taylor 
series about z, 


fly) = fe) + f(@)(y- 2) + 5f"(@)y — 2)? + o((y - 2)?). 
Hence, 


E*(f(Y%)) = f(x) + f(z) E* — 2] 


+ A p"(e) E*((%; - 2)°] + of (¥ - 2). 


A Brownian motion with drift 4. and variance parameter o” starting at x can 
be obtained by letting Y; = X,;+tu+ a2, where X; is a (zero drift) Brownian 
motion with variance parameter o” starting at 0. Hence, 


E*|¥, —2z| =E[X; + tu] = ty, 


E*|(¥; — 2)°| = E[(X; + tu)*] = (E(X: + tu)? + Var( Xe + ty) 
= (tu)? + 07t. 
Also, since (Y; — x)” is order t, 0((Y; — x)”) is o(t). Therefore, 
of] _,, E*Lf(%)) -E*Lf(vo) 
Ot |,-9  %70 t 
PY. 
= uf'(2)+ TH"(a). 


We see that the inclusion of a drift has added a first derivative with respect 
to x. 
In d dimensions, if the drift = (f11,.-.. , 4a), we would get 
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8.9 Exercises 


8.1 Let X be a normal random variable, mean 0 variance 1. Show that if 
a> 0 


P{X >a}< 


(Hint: 


/ et /2 dy < | ger? dz.) 


8.2 Let Xn1,...,Xnn be independent normal random variables with mean 
0 and variance 1/n. Then 


AX ag is 
is a normal random variable with mean 0, variance 1. Let 
ME ae | Doicoeo na s 
Show that for every € > 0, 


lim P{M, >«}=0. 


(Hint: it will be useful to use the estimate from Problem 8.1. It may also be 
useful to remember that if Y is normal mean 0, variance o”, then a~!Y is 
normal mean 0, variance 1.) 


8.3 Let Xn1,...,Xnn be independent Poisson random variables with mean 
1/n. Then 


X=Xpait-'t+ Xan, 
is a Poisson random variable with mean 1. Let 
My Max ij<ax4 A nate 
Find 
Jim PIMs > L/2\. 
8.4 Let X; denote a standard (one-dimensional) Brownian motion. Find the 


following probabilities. Give your answers as rational numbers or decimals to 
at least three places. 
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( 
( 
( 
(d) X;, = 0 for some t with 2<t<3 
(ce) X; < 4 for allt with0 <t<3 

(f) X; > 0 for all t > 10. 

8.5 Random variables Y;,... , Y, have a joint normal distribution with mean 
0 if there exist independent random variables X,,... , Xn, each normal mean 
0, variance 1, and constants a;; such that 


Yi, = aX +--+ + QinXn. 


Let X; be a standard Brownian motion. Let 5s; < sg <---<s,. Explain why 
it follows from the definition of a Brownian motion that X5,,...,Xs,, have a 
joint normal distribution. 


8.6 If Y|,...,Y, have a joint normal distribution with mean 0, then the 
covariance matrix is the matrix [ whose (7,7) entry is E(Y;Y;). Let X; and 
S1,--.,8n be as in Exercise 8.5. 

(a) Find the covariance matrix T for X5,,... ,Xs,. 
(b) The moment generating function (mgf) for Y1,... ,Y, is the function 
f : R” — R defined by 


f (tr, _ ty) = if lea ria Y a), 


Find the megf for Y;,... , Y, in terms of its covariance matrix I’. 

(c) If two distributions have the same megf, then the two distributions are 
the same. Use this fact to prove the following: if Y;,...,Y, have a mean 0 
joint normal distribution, and E[Y;Y;] = 0 for all 2 # j, then Y;,... ,Y, are 
independent. 


8.7 Suppose X; is a standard Brownian motion and Y; = a~!/?X 4 with 
a> 0. Show that Y; is a standard Brownian motion. 


8.8 Suppose X; is a standard Brownian motion and ¥; = tX1/;. Show that 
Y; is a standard Brownian motion. (Hint: it may be useful to use Exercise 


8.6 (c).) 


8.9 Let X; be a standard Brownian motion. Compute the following condi- 
tional probability: 


P{X2 >0| X1 > 0}. 


Are the events {X; > 0} and {X2 > 0} independent? 
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8.10 Let X; and Y; be independent standard (one-dimensional) Brownian 
motions. 

(a) Show that Z, = X; — Y; is a Brownian motion. What is the variance 
parameter for Z;? 

(b) True or False: With probability 1, X; = Y; for infinitely many values 
of t. 


8.11 Let X; be a standard (one-dimensional) Brownian motion starting at 
0 and let 


M = max{X;:0<t< 1}. 
Find the density for M and compute its expectation and variance. 


8.12 Let X; be a standard (one-dimensional) Brownian motion starting at 
0 and let 


T =min{t:|X;/=1}, T= min{t: X; = 1}. 
(a) Show that there exists positive constants c, 3 such that for all t > 0, 
Piri ece 


Conclude that E[T] < co. ; 
(b) Use the reflection principle to find the density of T, and show that 


~ 


E [T} = oo. 
8.13 Let X;,7 be as in Exercise 8.12 and let 
T* =min{t: X; =1 or X; = —3}. 


(a) Explain why X7 and T are independent random variables. 
(b) Show that T* and X7~ are not independent. 


8.14 Let X; be a standard (one-dimensional) Brownian motion started at a 
point y chosen uniformly on the interval (0,1). Suppose the motion is stopped 
whenever it reaches 0 or 1, and let u(t,x2),0 < x < 1 denote the density of 
the position X; restricted to those paths that have not left (0,1). Find u(t, x) 
explicitly in terms of an infinite series and use the series to find the function 
h and the constant 6 such that as t — oo, 


u(t,z) ~ ee? A(z). 


8.15 Let the Cantor-like set A be defined as follows. Let Ag = [0, 1], 


2 3 
A= 62/024 
1 oz] uf zal. 
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and A, is obtained from A,_; by removing the “middle fifth” from each 
interval in A,,_,. Let 


What is the fractal dimension of A? 


8.16 Suppose that X has a Cauchy distribution, i.e., has density 


oa 


= ————, -~<2< OO. 
m(1+ 27) 


(a) Ifa > 0, let Y =a~!X. What is the density of Y? 

(b) Suppose that Y, Z are independent random variables each with a Cauchy 
distribution. Show that the average (Y + Z)/2 also has a Cauchy distribution. 

(c) For which r > 0 is E[|X|"| < oo? 


8.17 Let X; = (X},X7) denote a standard two-dimensional Brownian mo- 
tion. Let 


CS mints Xe St}. Yeas. 


(a) Which of the following properties does the process Y; satisfy? 


(i) Yo = 0, 
(ii) For s) < t) < 82 < tg < +++ < Sm, < tn, the random variables Y;, — 
Ys,,---,¥t, — Ys, are independent; 


(iii) If 0 < s < t, then the distribution of Y; — Y, is the same as that of 
Yt-s: 
(iv) Y; is a continuous function of t. 


(b) For which \ > 0 does the process Z; = a~* Yq; have the same distribu- 
tion as Y,;? 


Chapter 9 


Stochastic Integration 


9.1 Integration with Respect to Random Walk 


The goal of this chapter is to introduce the idea of integration with respect 
to Brownian motion. To give the reader a sense for the integral, we will start 
by discussing integration with respect to simple random walk. Let X1, X92,... 
be independent random variables, P{X; = 1} = P{X; = —1} = 1/2 and let 
S,, denote the corresponding simple random walk 


aoe, Cece oe ee 


As in Section 5.2, Example 3, we think of X, as being the result of a game 
at time nm and we can consider possible betting strategies on the games. 

Let F,, denote the information contained in Xj1,...,X,. Let B, be the 
“bet” on the nth game. B, can be either positive or negative, a negative 
value being the same as betting that X, will turn up —1. The important 
assumption that we make is that the bettor must make the bet using only the 
information available up to, but not including, the nth game, i.e., we assume 
that B, is measurable with respect to F,_,;. The winnings up to time n, Z,, 
can be written as 


Zn = 57 BX, = SBS, — §-1] = 57 BAS;, 
i=1 =I = 


where we write AS; = 5S; — S;_1. We call Z, the integral of B,, with respect 
to Sp. 

There are two important properties that this integral satisfies. The first 
was shown in Section 5.2, Example 3: the process Z, is a martingale with 
respect to Fy, i.e., if m <n, 


FA Lig NP) ae 


In particular, E(Z,,) = 0. The second property deals with the second moment 
of Z,. Assume that the bets B, have finite second moments, E(B2) < oo. 
Then 


Var(Zn) = E(Z2) = )) E(B). 
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To see this, we expand the square to write 


Z? = 57 BEX? +2 S$ > BB;X;X;. 


i=1 l<i<j<n 


Note that X? = 1 and hence 


E be Be x2) = SE (B?). 


Suppose 7 < 7. Then B;, X;, B; are all measurable with respect to F;_1 while 
X, is independent of F;_,. Using (5.3), we see that 


E(B; B;X;:X; | Fai) = BiB; X,E(X; | Fea) = B;B;X,E (X;;) = QO, 
and hence 


9.2 Integration with Respect to Brownian Motion 


Here we describe a continuous analogue of the discrete integral given in 
the last section. Instead of a simple random walk, we will take a standard 
(one-dimensional) Brownian motion, which we will write W;. We can think 
of this as a continuous fair game such that if one bets one unit for the entire 
period [s,t] then one’s winnings in this time period would be W; — W,. 

Let Y; denote the amount that is bet at time t. What we would like to do 
is define 


t 
Zi = / a dW. 
0 


The process Z; should denote the amount won in this game up to time t if the 
amount bet at time s is Y;. It is a nontrivial mathematical problem to define 
this integral. The roughness of the paths of the Brownian motion prevent one 
from defining the integral as a “Riemann-Stieljes” integral. 

We will make two assumptions about our betting strategy Y,. The first 
assumption is that E(Y,?) < oo for all ¢ and for each t, 


t 
/ E(Y2) ds < oo. 
0 


This condition will certainly be satisfied if we restrict ourselves to bounded 
betting strategies. ‘he second assumption is critical and corresponds to our 
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assumption in the discrete case that the bettor cannot look into the future to 
determine the bet. Let F; denote the information contained in the Brownian 
motion up through time t. We assume that Y; is #;-measurable. In other 
words, the bettor can see the entire Brownian motion up through time t 
before choosing the bet, but cannot see anything after time t. 

It is not too difficult to define the integral if we make the restrictive as- 
sumption that the bettor can change the bet only at a certain finite set of 
times, say ty < tg <---<t,. The bets then take the form 


Yo; OS t= q, 
Yi, ty < C= to, 
Y= 
Y,, tn <t< 00. 
Here Yo,...,Y, are random variables with E(Y,7) < oo, and Y; must be 


measurable with respect to F;, (where tp = 0). We will call a betting strategy 
that can change at only a finite number of times a sample strategy. For a 
simple strategy, we define the stochastic integral for t; < t < t;41 by 


t J 
a= [ Y, dW, = S7¥;-1[M, — We,_.] + 4M — Wi} 
0 


t=1 


There are three important properties that the stochastic integral of a simple 
strategy satisfies. The first is linearity: if X, and Y, are two simple strategies 
and a,b are real numbers, then aX, + bY, is a simple strategy and 


t 


t t 
[ (ox. + bv) aw, =a X, aw. +b [ Y, dW,. 
0 0) 0) 


This can be easily checked. 

The other two properties are direct analogues of the properties of the dis- 
crete stochastic integral of the previous section. We say a continuous-time 
process Z; is a martingale with respect to F; if each Z; is F,-measurable; 
E (|Z;|) < co for each t; and if s < ¢, 


E(Z, | Fe) = Ze. (9.1) 


The second property is that the stochastic integral Z,; as defined above is 
a martingale with respect to the information F; derived from the Brownian 
motion. It is easy to see that Z; is ¥,-measurable and the condition E (|Z;|) < 
co follows from the fact that the second moments of the Y; exist. We will now 
verify (9.1). First assume t; < s < t <tj41 for some 7. Then we can write 


Zit = Zs + Y; |W: — W,|. 
Since Y; and Z, are F,-measurable and W; — W, is independent of F;, 
E(Z; | Fs) = 2Z,+Y;E (Wi —Ws | Fs) = Z, + YJE(Wi — Ws) = Zs. 
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In particular, if t; <t < tj41, 


E(Z: Fi = Le E(Z: | Fe, ) = Ze,. 


j+1 


Note that E'(Z; | Fx) = E(E(Z; | Fis) | Fi) = E( Z,, | Fi) = “Lt, _4, 
and by iteration we can see that for alli < 7, E(Z | F:,) = Z%,. Finally, if 
ti Ss <tiy1,t; <t <t,41 for some z < J, then 


E(Z: | Fs) = E(E(Z: | Ft) | Fs) = E(Ze4, | Fs) = Zs. 


a4+1 


This gives (9.1). 
The third property gives a way to calculate the second moment, 


DN Ag = [eo ds. (9.2) 


The right-hand side is a standard calculus “ds” integral. To prove this, assume 
that t; <t <tj41. Note that E(Y2) is a step function in s so 


t ta 
[ 802) ds = DEW — 4) +EOPO-4). 
: i=0 
If we expand the square, we see that 
2 
he S> Y7 (WM, —W.,_,]° + Y? (Wi — W,,|° + (cross terms), 
i=1 


where “cross terms” represents a sum of terms of the form 


Yi-1Ye—-1|Wi, — Wi,_,][Wi, —Wi,.], i<k, 


or 
Yi-1¥5 |W, — We,_,|[We — W5]. 
If7<k, 
E(Yi-1Ye—1|W2, = Wi, (We, _ Wi s| | Fis 23) 
= i-1Yb—1[Wi, ves Wi, JE(W, = Wey | Fic) 
= Yi-1Ye—1[Wi, — Wi,_, JE (We, — We,_,) = 9, 
and hence 


E (Y¥i-1Ye—1[Wi, — Wi,_,|[We, — We,_1]) = 


EE (Yi-1Ye-1[We, — We,_.J[We, — Wee_a} | Fe,_1)] = 9- 
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Similarly, 
E(¥i-a¥s[W:, — We,1% — Wj]) = 0. 


Therefore 
i) 
E(Z?) =) E(¥2 (Wi, — W,_,]?) + E(¥7 [Wi — W,,]°). 
i=1 
Note that 


EY? (Wz, _ Wi,_,]° | Fic av Y°,E[(Wz, _ W:,_,)° | F§,-21| 
= Y,,E [(W%, _- Wi,_,)7] 
= Yj21(ti — &:-1). 


Hence, 


EY, 


I 


[WM — W2,_.]?] = E(E(Y21 (Wi, — Wei)? | Fi,_1])) 
E (¥;2.1) (ti: — te-1). 


Similarly, 
E[Y¥;(W. — W.,]°] = E(¥})(t — ty). 


This proves (9.2). 

To define the stochastic integral for betting rules Y, that are not simple, 
we do the standard mathematical procedure for defining continuous objects— 
approximate by discrete and take a limit. Let Y, be measurable with respect 
to F,, satisfying the second moment conditions listed above. A little more 
must be assumed about the Y, to be mathematically precise: the paths of Y, 
(i.e., Y; considered as a function of s) should be right continuous and have 
left limits; we will not worry about this in our informal treatment. For each 
n > 0, define the approximate strategy Y<”” by 


yim =n f Y,.dr. er ee a 
( 


: k—1)/n n ~— n ! 
where we set YA”) = 0 for s < 1/n. We have arranged the approximation so 


that for each tf, y.”) , 0<s<tisa simple strategy that is #,-measurable. 
The key estimate that can be proved (we will not do it) is that 


y/™) ai. 


in the sense that for each t 


t 


lim | E({Y, — Y{~]*) ds = 0. 


TL— CO 0 
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This allows us to define the stochastic integral 
t 

Le / Y, dW,, 
0 

by saying that Z, is the mean-square limit of the random variables 
t 
ZA”) = / ¥!") dW. 

0 


The first and third properties of the stochastic integral allow this definition 
to work since aS n,m — oO, 


t 
E ((Z6” — Z(™)?) = / E ((¥i") — ¥™)]?) ds = 0. 


In the process of showing the limit exists, one also shows that the three 
properties of the integral still hold. 


Linearity: 


t t t 
/ laX, + bY,| aw. =a | X 5 aw. +b | Y, dWs. 
0 0 0 


Martingale Property: Z; = i, Y, dW, is a martingale with respect to Fy. 
In particular, E(Z;) = 0 for allt. 


Second Moment Calculation: 
t 

var ( | Y, dW, ) = 
0 


The relationship 


t 
Zt = / ya dW, 
0 


is often written in the differential form 
dZ = Y; dW;. 


The process Z; can be thought of as a process that at time t looks like a 
Brownian motion with variance parameter Y,’ (recall that if W; is a standard 
Brownian motion, then oW; is a Brownian motion with variance parameter 
o”.) Sometimes one has a process 


t t 
z= f x, ds+ [ Y, adW,, 
0 0) 
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where the “ds” integral is a standard calculus integral. In differential form 
this is written 


dZt — Xt dt ae be dW;,. 


This represents a process that at time t looks like a Brownian motion with 
drift X; and variance parameter Y,’. 


9.3 Ito’s Formula 


How does one calculate stochastic integrals? As an example, consider the 
integral 


t 
z= | W, dW. 
0 


W, is #,-measurable and this integral is well defined. One might hope that 
standard calculus rules would work for stochastic integrals in which case we 
would have 
: 1 1 1 
/ W, dW, = 5We ~ 5 Wo = 5 We 

However, a quick examination of this equation shows that it cannot be true: 
the left-hand side is a random variable with expectation 0 but the right-hand 
side has expectation t/2. In this section, we derive a formula that will allow us 
to calculate this integral exactly. This formula is usually called It6’s formula 
and it is the fundamental theorem of stochastic calculus. 

Let us start by reviewing the ordinary fundamental theorem of calculus. 
Suppose we have a continuously differential function f(t). Around each to we 
can expand f(t), 


f(t) = f(to) + f'(to)(t — to) + o(t — to). 


We can write f(t) as a telescoping sum 


f(t) = £00) >> 1 (ee) ay (=) | 


We now use the Taylor’s series about jt/n to write 


(2%) -1(B)or Eel) 
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and 


f(t) — £(0) Sr (=) ++ Sto(2). 


As n — oo the second term on the right tends to 0 and the first term tends 
to the integral of f’. We therefore get 


t 
F(t) = f(0) = f(s) ds, 
0 
which we all know very well. 


Now let W; be a Brownian motion, and f a function with at least two 
continuous derivatives. At each x9 we can expand f(z), 


f(x) = f(xo) + f'(%o)(x — 0) + sf" (0) — x9)” + o((a — 20)*). 


Write f(W;) as a telescoping sum, 


FW.) = f(Wo) + SUF (Wass,) — £(W2,)] 
j=0 


By using the Taylor series expansion about W,, we can write 


f(Wa+s,) = f(Wa,) = f'(W,) [Wass , _ Wi] 


The o(-) is smaller than order n~! since [W +1, —W.,]? is of order (t/n). We 


then get 
n—-1 
f(W:) — f(Wo) = d— f'(Wa,)[Wiss, — Wa] 
j=0 


As n — oo, the third term on the right goes to 0. Since f’ is continuous, the 
first term will approach 


/ "f!(We) dW. 


Stochastic Integration 207 


To see what the second term converges to, let us consider the general ques- 
tion of the limit of 


n—-1 


Dd 9(W2,)[Wasr, -W, vine 


where g is a continuous function. First consider the case where g is identically 
1. Let 


n-1 
(n) =— > (Wiss, T= W.,)°. 
j=0 
The limit 


TMu— CO 


is often called the quadratic variation of W:. [W 11, — Wz,|* has the same 


distribution as (t/n)U?, where U is normal mean 0, variance 1. Note that 
E(U?)=1 Var(U?) = E(U*) — [E(U”)]? = 2. 


Hence, since the increments of W are independent, 


n—1 
E(Q:”) = OE (Wis, — Wa,)) = 1, 
j=0 
(n)) pe. Dee 
Var( = Sve - Wi)’ ) = nVar((t/n)U*) = —. 


As n — oo, the expectation of QQ” stays constant but the variance goes to 
0. In other words, the limiting random variable Q; is just a constant, and the 
quadratic variation of Brownian motion up to time ¢ is the constant random 
variable equal to ft. 

For any g let 


n—1 
Q:” (9) = > 9(t) [Wi — Wi), 
j=0 


and 


If g is a step function of the form 


g(s) = u(W 4); 
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then 


— 00 
m—1 k-1 
. » 2 
= lim u(W 2 re [W kp+i+1, on W ep +2,| 
k—oo km km 
j=0 4=0 
m-—-1 k-1 
= u(W ; ,) ) lim ) [Westatt  — W r+ eg 
7=0 i—0 


The result about quadratic variation tells us that 


k-1 
t 
lim [W kg+r41, Pa: Wrote, |° =—., 
k—oo rar km km Mm 
Hence 
m—1 t 
Q1(g) a om u(Wa J). 
J= 


Now assume g is continuous. For each n, let g, be the step function 


J g+1 
= g(t —~t<s< — lt. 
gn(8) g( iF n n 
Note that 
IQ:(9) — Qt(9n)| < Ilo — gn|lQ: = tg — onl, 
where 


lg — gn|| = sup |g(s) — gn(s)I. 
O<s<t 
The continuity of g implies that ||g — g,|| — 0 as n — oo. Hence 
ca 
Q:(g) = lim Q:(9n) = wee (2 ) —. 


The last expression is the usual representation of the integral of g as a limit 
of Riemann sums. Therefore, if g is continuous, 


ata) = | ron 


Stochastic Integration 209 


Note that if h is continuous then since W; is continuous, the function g(t) = 
h(W;) is continuous. If we plug this result into (9.3) we can conclude the 
following. 


It6’s Formula. If f is a function with two continuous derivatives, and W, 
is a standard Brownian motion, 


(We) — (Wo) = f sw) aWe + 5 f £"W) ds 


This formula is sometimes written in the differential form, 


df (W:) = f'(W:) dW; + = f"(W.) at. 


Example 1. Let f(t) =t?. Then f’(t) = 2t, f(t) = 2, and 


t 1 t 
w2= [ aw. aw.+5 | 24s, 
0 2 0 
or 
[ow ie w2 a 
en ee. ~ 


This turns out to be a particularly nice example; in general, one cannot use 
It’s formula to calculate integrals exactly. 


Example 2. Consider the process 
Xt _ eet. 


This process is called geometric Brownian motion and is often used to model 
stock prices. Ité’s formula with f(t) = e’ says that 


t t 
i 
X-1= | eWs we+5 | eWs ds. 
0 2 Jo 
In other words X; satisfies the stochastic differential equation 


1 
dX, = X; dW; + at dt. 


9.4 Extensions of It6’s Formula 
Suppose W; is a standard Brownian motion and Z; satisfies 


dZ, = X, dt + Y; dW, (9.4) 
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where X;, Y; are F;-measurable and have continuous paths. In other words, 
t t 
Zi =Zot+ X,ds+ | Y, dW. 
0 0 
If R; is F;-measurable we define i R, dZ, by 


t t 
[R. az, = | R, (X, ds + Y, dW.) 
0 0 


t t 
a | RX, ds+ | R, Y, dW,. 
0 0 


Suppose f has two continuous derivatives. As in the previous section we can 
write 


f(Zt) — f(Zo) = S f(Zs4)[Fi2, — 224] 
j=0 


n—1 n—1 
1 
2 
+5 of" (Za) Zeer, — Za + Yt (*) (95) 
As n — oo, the last summation goes to zero. Since Z; satisfies (9.4), 


t 
Ziti, — 41, Xa, ae Yay [Wary = Wail. 


In the limit, we get 


n—l 


dim, >) £22) Zest - Za 
j= 


n 


n—-l n—-1 


t 
= lig, Da Zs,)Xa,—+ lim > f'(Z24) Ya, (Wass, — Wall 


n— oo 
j=0 


[r@ )Xeds+ [Zs )Y, dW, 


- [ pe 


Similarly to the last section, we can see that 


n—-1 


t 
Jim DF" Z44)|Z v1, — Zy,)° = : F(Z) U2) a; 


j=0 
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where (Z); denotes the quadratic variation of Z,, 


n—1 
(Z)¢ = lim) [Z.21, — Z2,]”. 
j=0 


If we consider only the quadratic variation of the stochastic integral part of 
Zt, we get 


t n—l (j+1)t/n : 
(f Y,dW,): = lim 5 / Y, dW, 
0 Dre j=0 jt/n 
n—l1 
= lim Do ([¥a, + 0(1)] (Wo, — Wal)? 
j=0 


t 
=f Y? ds. 
0 


We have left out a number of details here but the basic idea is the same as in 
the previous section. Also, 


1 (j+1)t/n 
Zusy~ Z4,=0 (=) +/ Y, dW,, 
” n jt/n 


and hence 


; l (j+1)t/n 
Zen Z,0=O(sa)+| fo Yaw, 
j 


n—-1 
Ze = him, ) [Zoe — Za)” 
j=0 
n=1[ A(j+1)t/n : 
— lim S- / Y, dW, 


In other words, the quadratic variation of Z is the same as the quadratic 
variation of its “stochastic integral” part. Combining all of this we get the 
following. 


It6’s Formula II. Jf f has two continuous derivatives and Z, satisfies (9.4), 
then 


Ci Cae [r@ )dZ, += sf rod 


= [2 )Y, dW, yee + — sf (2 aX. ds. 
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We now generalize a little more and assume that f(t, x) is a function of both 
time ¢t and space x. We will need to assume that f has two continuous deriva- 
tives in x, and one continuous derivative in t. We will write f’(t, x), f’ (t,x) 
for the partials with respect to x and f (t,x) for the partial with respect to t. 
We can expand f(t, Z:) — f(0, Zo) into two telescoping sums 


ros s0.24 8 [s(Ee2,)~1( nes) 


j=0 


Using the approximation 
4] , 
7 Gazam _ f (24,2211) ~ 


it can be shown that 


t 
ia [J (2, Zs) - 4 (44 Zun)| -/ F(Z.) ds 


The limit of the second telescoping sum can be handled as before. This gives 
the following. 


It6’s Formula III. Jf f(t,x) has two continuous derivatives in x and one 
continuous derivative int, and Z, satisfies (9.4), then 


f(t, Z:) — f(0, Zo) 


+r s+ [ i(s.20 d+ fs" sZa)al Lys 


7 [ f'(s, Zs) ¥.dW, 
0 


t 
+/ f(s, Zs) + rs Zs)X5 + ; ace Zs) eal ds. 
0 


A particular case of this formula occurs when Z; = W;. Then, 


f(t, We) — f(0, Wo) = 


/ f'(s, Ws) dW, +f [f(s,Ws) + =f'"s, W,)] ds. 
0 0 
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Example 1. Let f(t,z) = e%'+°* where a,b are real numbers. Then Z; = 
etttoW: satisfies the stochastic differential equation 


b2 
dZ: = b Z; dW, fe (a + a Zi dt. 


Equivalently, the solution to the equation 
dZ = r Zt dt + b Z; dW, 
is 
b2 
Zi = exp {Wi + (r- st. (9.6) 
Another generalization comes from considering Brownian motion in more 
than one dimension. Suppose W; = (W},... , W#) is astandard d-dimensional 


Brownian motion and f(z!,... ,v%) is a function from R®@ to R that has con- 


tinuous second derivatives. If we expand f in a Taylor series about x = 


(x',... ,2%), we get 


d d 
+5 OY Sel) w - 2°) Wt = 2*) + olly-2?). 


Here we use subscripts to denote partial derivatives. As before, we can write 
f(W,) as a telescoping sum to show that 


The first two terms are of the type we have seen. To find the limit of the last 
one, we show that the “covariation” 


(WI,W"), = Tim So (Wi, - Wi 
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This is done similarly to the quadratic variation in the previous section. In 
this case, 


=o” . 
n—-1 . 
= CE [(Wha, - Wi.) (Wha, -— WE) 
l=0 
n—-1 
= CE (Win, - Wi.) E (Whi, - WE.) = 0. 
1=0 si : " 


S- Var[(Wiss, Wi) (Wiis, = wt.) 
l=0 . . ° 
n—1 
= CE(Wi, - W1,)? Wha, - WE)? 
1=0 
n—-1 
= SE (Win, — Wi PIE (Wha, — WE) 
l=0 
n—1 
nn 
l=0 


Therefore, the last term in the telescoping sum for f(W;) — f(Wo) vanishes in 
the limit. If f also has a t-dependence, it can be handled as above. We now 
summarize. Recall that the Laplacian of f is defined by 


d 
Af(x) = de f53(2). 


It6’s Formula IV. Suppose f(t,z!,... ,x%) is a function with one continuous 
derivative in t and two continuous derivatives in x = (x!,... ,2%). Suppose 


W, = (W},... , W) is a standard d-dimensional Brownian motion. Then, 


dt . t 1 
Ft We) ~ £00,Wo) = > f poweyaws + [Lf(w.) + SAF] as 


Stochastic calculus is similar to usual calculus with an additional rule added. 
Let us consider calculus from a differential perspective. If h(t) is a function, 


and Ah(t) = h(t + At) — h(t), then h’(t) is defined by the rule 
Ah(t) = h'(t) At + o(At), At—0. 
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To calculate h’(t), we calculate Ah(t) and then throw away all the terms 
that are o(At). For example, suppose h(t) = f(t) g(t) where f and g are 
differentiable. Then 


Ah(t) = f(t + At) g(t + At) — f(t) g(t) 
= f(t + At) [g(t + At) — g(t)] + g(t) [FE + At) — F(t) 
= [f(t) + f(t) At + o(At)] [g'(¢) At + o( At)| 
+g(t) [f7(t) At + o(At)| 
= [F(t) 9 (t) + F(t) g(t)] At + of At). 


This gives the product rule (fg)’ = fg’ + f’g. | 

If W},... ,W? are independent Brownian motions, then AW), the incre- 
ment of the Brownian motion, is of order At. Hence, if we multiply two of 
them together, we get something of order At, which cannot be thrown away. 
If we multiply three of them together, or if we multiply one of them times 
a term of order At, then the product is of order (At)?/? and can be thrown 
away. So, in order to do stochastic calculus one needs only add to usual calcu- 
lus the rule for handling products of two Brownian increments. It6’s formula 
tells us what to do. In differential notation, we have 


(AW?)? = A(W?), = At, 


(AW?) (AWS) = A(W3,W*), =0, JG #k. 


More generally, if 


d 
dZy = Xj dt + YY?" dW, (9.7) 
j=l 
d . . 
dZ? = X? dt+ SY)" aw}, (9.8) 
j=l 
then the covariation term is 
(2, 2),= 57 / (ve YF) at, 
j=1 79 
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This allows us to derive the stochastic calculus product rule. Note that we 
can write 


A(Z} Z?) = Zia, AZ? + Z2 AZ} = Z} AZ? + Z2 AZ} + AZ; AZ. 


Product Rule. If Z}, Z? satisfy (9.7) and (9.8), then 


UZ OS Z dl PZ dae UZ 2 (9.9) 


Example 2. Exponential Martingale. Suppose dZ; = Y; dW;, so that Z; 
is a martingale. Ito’s formula shows that 


1 1 
dje**] = e* Y, dW; + ae Y/ dt =e7*dZ,.+ Bo d(Z) t. 


Assume sufficient boundedness so that E {e“] < 00; boundedness of Y; is suf- 
ficient. One can see from the differential equation that e“* is a submartingale 
(ie., E(e* | F,) > e%*) but not a martingale (if Y is nonzero). One way to 
obtain a martingale is to subtract the “dt” term. Another way is to multiply 
e“t by an appropriate process. Let M; = e%¢ R; where 


Note that R; is random but differentiable; in fact, Ry = —(Y,?/2) Ry. Since R; 
is differentiable, (e*, R), = 0 (since A(e** R;) is of order (At)?/2). Therefore 
by the product rule we get, 


dM, = R, d(e*‘) + e** dR; = M, ¥; dW; = M, dZ. 


Hence, M; is a martingale. This is sometimes called the exponential mar- 
tingale since it satisfies an stochastic differential equation analogous to the 
exponential differential equation f’(t) = a f(t). 


9.5 Continuous Martingales 


If W; is a standard Brownian motion; Y; is measurable with respect to F;, 
the information in W,,0 <s < ¢t; and 


t 
/ E[Y2] ds < o, 
0) 
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then 
t 
M, =} Y, dW,, (9.10) 
0 


is a square-integrable martingale, i.e., a martingale with respect to {F;} sat- 
isfying E [M?] < oo. It is also a continuous martingale which means that with 
probability one the function t +> M; is continuous. Many of the results from 
Chapter 5 have analogues for continuous martingales which can be proved 
with little extra effort. Note that if 6 > 0, then M,, = Ms, is a (discrete 
time) martingale with respect to Fn = Fon. UT isa stopping time with re- 
spect to F; we define the stopping time T°) as the smallest integer n such 
that dn > T. To determine whether or not T°) = n it suffices to see the 
Brownian motion W; up through time dn; therefore T®) is a stopping time 
for the discrete time martingale. By letting 6 — 0, the following extensions 
of results from Chapter 5 can be established. 


Optional Sampling Theorem I. /f M; is a continuous martingale and T 
is a bounded stopping time with respect to {F;}, then 


E [Mr] = Mo. 


Optional Sampling Theorem II. /f M; 1s a continuous martingale and T 
is a stopping time with respect to {F;} satisfying P{T < co} = 1; 


E ||Mr|| < 00, 
and 
Jim E [|Mi| 1{IZ| > ¢}] =0. 
then 


E [Mr] = E [Mo]. 


Maximal Inequality. Jf M; is a continuous square-integrable martingale of 
the form (9.10), then for every a > 0, 


E(M?] 1 f° 
PY max |M,| > ab < a -=| E[Y2] ds. 
a= 0 


O< ~~ @? a2 


If M; is a continuous martingale with respect to F; and T is a stopping time 
then J, = Mz, is a continuous martingale. Here t\ T = min{t,T}. Suppose 
U is an open subset of R? and f(t,z',... 2%) is a continuous function that 
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has one continuous derivative in ¢ and two continuous derivatives in the spatial 
variable provided x = (z!,... ,x*) € U. Suppose Z; satisfies (9.4) and let T 
denote the first time t such that Z; is not in U. Then It6’s formula describes 
the evolution of f(t A T, Zar) for t < T. As an example, suppose W; is a 
standard d-dimensional Brownian motion and U is a bounded open set in R?. 
Let f : R¢ — R be a continuous function such that Af(x) = 0 for x in U. 
Then, Ito’s formula shows that M; = f(Wiar) is a continuous martingale. If 
Wo € U, then M; is a bounded martingale (since f is a bounded function on 
the compact set U), and therefore if x € U, 


f(x) = E|[Mo | Wo = 2] = E[Mr | Wo = 2] = El[f(Wr) | Wo = a]. 


9.6 Girsanov Transformation 


Suppose that we play a simple game. A coin is flipped. If it comes up 
heads we win $1; otherwise, we lose a $1. However, suppose the coin in unfair 
so that it has probability 3/4 of coming up tails each time. Then this is an 
unfair game. There are two natural ways to try to make this a fair game. 


e Change the payoff so that we win $1.50 if it comes up heads and lose 
only .50 if it comes up tails. In this case the expected winning is zero. 


e Change (or replace) the coin so that the probability of a heads is 1/2. 


In this section, we will discuss a way of changing a continuous process with 
drift to a process without drift that is analogous to the second option above. 
Suppose Z; satisfies 


where W; is a standard Brownian motion. We let 7; denote the information in 
{W, :s <t}, and we assume that X;, Y; are F;-measurable. If X; is nonzero, 
then Z; is not a martingale. One way to get a martingale from Z; is to subtract 
the “dt” term. This is analogous to the first option in the previous paragraph. 
We will describe another way to obtain a martingale, analogous to the second 
option, which is called the Girsanov or Cameron-Martin transformation. 

Instead of subtracting the drift, we will change the weight on paths. By 
giving greater weight to those paths that are moving in the direction opposite 
the drift, we will balance things so that the average drift is zero. To illus- 
trate the idea, we will start with a discrete example. Suppose Jj, Jo,... are 
independent random variables with 


P{J; =1} =1~P{J; = -1} =p, 
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where 0 < p < 1. Let So = 0,5, = J) +---+ Jn, and let F,, denote the 
information contained in Jj,...,Jn. If p# 1/2, then S, is not a martingale 
with respect to F,.. While S, — n(2p — 1) is a martingale, we will consider 
a different martingale obtained by keeping the same paths but changing the 
measure. Our process 5S, can be considered as a measure P on random walk 
paths of length n that gives measure p\+5")/2 (1 — p)("-S»)/2 to each partic- 
ular path (note that the number of first n steps that are “+1” is (n+ S,,)/2 
and the number of steps that are “—1” is (n — S,)/2). We can write 


Sn /2 
rm n— n Pp —mnN 
pee Leni) eis = Ap aap) |i (2 | a 


Let 


M,, = [4p(1 — p)|-"”? =e : 


We define a measure on paths P by P = M,,P. To be more precise, if A is 
Fy-measurable, then 


P(A) = E[I4 M,], 


where J, denotes the indicator function. Note that P gives measure 2~” to 
each path. In particular, the process S,, under the measure P, is a martingale. 

To generalize this idea, we will give a characterization of the weighting 
function M,. What makes this work is the fact that both M,, and M,, S, are 
martingales (under the measure P), see Exercise 5.10. We need M,, to be a 
martingale in order for the measure to be well defined as we now demonstrate. 
Suppose A is measurable with respect to F,, and m <n. Then A is also F,- 
measurable, so the two formulas for P(A) should give the same answer. But, 
since M,, is a martingale, 


E(M,, Ia] = E[E(My Ia | Fm)] =E (La E(Mp | Fm)| = E[Mmm La]. 


In order for S,, to be a martingale under the measure P, we need to show 
ifm <n, then 


Eon | San) = om: 


Here, Es(Sn | Fm) denotes the conditional expectation using the measure P. 
Using the definition of conditional expectation, we see that this equality is 
equivalent to showing for all events A that are F,,-measurable, 


E (14 Sin Mm] = E(la Sn Mn). 


But, this is just another way of saying that E[M, S, | Fm] = Mm Sm, ie., 
that M,, S,, is a martingale. 
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We return to continuous case and assume that Z; satisfies (9.11). The new 
weight will be given in terms of a nonnegative martingale M; (with respect 
to F;) with Mp = 1. We will define a new measure P by the relation “dP = 
M, dP”. To be more precise, if A is an F;-measurable event, then 


P(A) =E[I, Mj). 


If s <t, and A is ¥,-measurable, then it is also #;-measurable, so it may look 
like P is not well defined. However, since MM; is a martingale, 


E (4 Mi] = E[E(14 Mz | F-)] = E [La E(Mz | Fs)] = E (La Mo). 


This shows that P(A) is well defined. We say that M; is the Radon-Nikodym 
derivative of P with respect to P. 

We want to choose M; so that Z is a P-martingale, i.e., a martingale if we 
use the measure P. This will be true if M, and M; Z, are both martingales 
(with respect to P). This can be seen using an argument as in the discrete 
case above. 

Suppose M; is a martingale of the form dM, = R;dW;. Then the product 
rule tells us that 


d|M; Z;| = M, dZ, + Z dM; + d(M, Z), 
= [M: Xt + R Y; | dt ST [M: Y; - Zi R,| dW; 


If Ri = —M, X;/Y; and certain boundedness conditions hold, then this will 
be a martingale. 


Girsanov transformation. If Z; satisfies (9.11) and M; is a martingale 
satisfying 


xX 
dM, = -— M, dW,, 
Y; 


then Z,; 1s a martingale with respect to the measure P where 


dP = M, dP. 


Example 1. Suppose Z; is Brownian motion with drift, i.e., 
dZ, = udt + dW;. 

We want R; = —yu M;, so we need M; to satisfy the equation 
dM, = —pu M, dW. 


The solution to this is 
eH: 


M, = eThWe-(w? /2)t 
—— E [ee] 
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Hence if we weight Brownian motion with drift by M; we get standard Brow- 
nian motion. Note that if u > 0, then M; is larger for paths with W; (and 
hence Z;) smaller. 


Example 2. Suppose Z; satisfies 
dZ: = rZt dt + bZ, dW,, 


see (9.6). Then we would like to find M; satisfying dM; = —(r/b) M, dW. 
We have seen that 


M, = exp{—(r/b)W; — (r/b)*t/2} 


satisfies this. Hence Z; is a P-martingale where dP = M;, dP. 


9.7 Feynman-Kac Formula 
Suppose Z; satisfies the stochastic differential equation 


dZ, = a(Z,) dt + b(Zz) dW, (9.12) 


where a(x), b(x) are fixed functions. Such a Z; is often called a (time homoge- 
neous) diffusion. Note that Z; is Markovian, i.e., the dependence of the future 
{Z, : s > t} on the past F; lies entirely on the value Z,;. There is a close 
relationship between diffusions and certain second order partial differential 
equations. 

Suppose f(x), u(x) are two functions and let 


Jy = exp { viz) is} | 


V(t, v2) =E*(f(2Zt) J]. 


Here E*[Y] denotes E[Y | Zo = x]. We assume that this expectation exists 
for all t,xz. If s < t, then 


t 
TAS (Zeid | a| Sd go 12) exp : | v(Z,) art | F. 
= J, V(t — s, Zs). 
The left-hand side is a martingale since if r < s, then 
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Hence, we know that if M, = J, V(t — s,Z;), then M, is a martingale for 
O<s<t. Assuming sufficient differentiability, we can use It6’s formula and 
the product rule (9.9) to write 


dM, = J, dV(t — s,Z,) + V(t — s, Z,) J, ds 


= Js [u(Zs) Vit mats I Zs) — Vit — S, Zs) ae V(t Lie Z) a(Zs) 
+5 V"(t — s, Z,) b?(Z,)| ds + J, V'(t ~ 8, Z,) b(Z,) dWe. 


Since M, is a martingale, the dt term must always be zero, and V satisfies 


VO oO ; b?(x) V" (t,x) + a(x) V' (t,x) + v(x) V(t, 2). 


Feynman-Kac Formula. The solution to the partial differential equation 


Ven= : FOV Ga taoV Ce wOveEs 


with initial condition V(0,x) = f(x) is 


t 

Viia)S iE? 12) exp | u(Zs) is} 
0 
where Z, satisfies (9.12). 
By setting v = 0, we see that V(t,x) = E*[f(Z;)]| satisfies 
; | 2 wl / 

Vg) = 5 b* (x) V" (t,x) + a(x) V(t, 2). 

We can write 


E*f(Z)) = [ ~ FWaeearay 


where p(t, x, -) denotes the density of the random variable Z; assuming Zo = «. 
If we fix xz, then p(t, y) = p(t, x, y) is the solution to the equation with initial 
condition “delta function” at x. In particular, p satisfies 


Altsy) = 5 (uy) p(tsu) + aly) v'(t9). 


In the next section we will need a Feynman-Kac formula for a time inho- 
mogeneous diffusion 


dZ = a(t, Zt) dt + b(t, Zt) dW;. (9.13) 


Let u(t, x), f(z) be given functions. We fix a to and consider only 0 < t < fo. 


Let 
t 
J, = exp ‘| u(s, Zs) is} 
0 
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and let 


Vit,2) =E 12.) expf [ o(s,20) is} Ge | | 
Then, 
Elf (Ze) Jey | Fi] = Ae V(t, Zy). 


Since the left-hand side is a martingale, so is the right-hand side. Using the 
product rule and It0’s formula we see that 


1 
—V (t,x) = 5 b*(t, Z.) V" (t,x) + a(t, Z) V(t, z) + v(t, Z,) V(t, x) = 0. 
(9.14) 
Note that V(to, xz) = f(z). 
Feynman-Kac Formula II. The solution to (9.14) for0 < t < to with 
V(to, 2) a f(z) 18 


V(t,2) =E? PZ.) xs { [i Cea is} | (9.15) 


where Z; satisfies (9.13). 


9.8 Black-Scholes Formula 


The Black-Scholes formula is a way to calculate the current value of an 
option that is based on the price of a stock following a stochastic differential 
equation. Suppose 5S; denotes the price of a stock, and S; satisfies 


dS; = US dt+oaS; dW;,. 


By (9.6), the solution of this is 


Pe: 
Si = So exp { (Ht - Serowsh, 


Assume also that one can buy or sell a bond with guaranteed interest rate r. 
If we let Y; be the amount of money invested in bonds, then if we do not buy 
or sell any bonds the amount grows according to the equation 


dY; =r Y; dt. 


A European call option (with strike price K at time T) is an opportunity 
to buy one share of the stock at time T for price K. If Sp < K such an 
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option is useless, but if Sy > K, then it has a value of S; — K, which 
is the profit obtained by buying the stock and then selling it immediately. 
We can write the value as (Sp — K); where x; = max{z,0}. The Black- 
Scholes formula determines the value of this option at a time ¢ < 7’ under 
the assumption that there are no arbitrage opportunities. Let V; denote this 
value. Clearly Vr = (Sr — K)1, and V; should be measurable with respect to 
F;, the information at time t. It is reasonable to assume that V; = V(t, S¢); 
we will determine this function. Note that V(T,x) = (a4 — K)4. 

We can think of the option as an asset with value V; at time t < T. Suppose 
we sell such an option at time t < T and invest the money in a portfolio 
consisting of a combination of the stock and the bond, say X; shares of the 
stock and Y; invested in the bond. We assume we have a buying and selling 
strategy between bonds and stocks based on the stock price at a certain time. 
Here Y; is determined by the X; and the relationship that stocks are bought 
only with money obtained from selling bonds and vice versa. 

The value of the total portfolio (one option sold plus the total of assets in 
bonds and stocks) at time s is 


U; = —V(t, St) =f Or, 
where 


For ease, let us assume that Up = 0, 1.e., at time t = 0 we sold one option and 
invested that money in some combination of bond and stock. 

Suppose we monitor this investment up to time T' (switching between shares 
of the stock and the bond based on the price of the stock) using a strategy 
that guarantees that Ur > 0. If it is also true that with positive probability 
Ur > 0, then we have found a way to gain money (with positive probability) 
without any risk. This is called an arbitrage. Similarly, if there is a strategy 
to guarantee Ur < 0 with a chance that Ur < 0, then there are arbitrage 
possibilities by buying an option. The main assumption in the Black-Scholes 
formula is: there are no arbitrage opportunities with “self-financing” strate- 
gies. 

The self-financing assumption is that the change in the total value of the 
bond/stock portfolio is given by 


In other words, the change in the value is the number of shares of stock times 

the change in stock price plus the number of units of the bond times the 

change in bond price. Assuming (9.17), we can use It6’s formula to write 
dU, = —dvV (t, St) i dO; 


1 
= —V(t, S;) dt — V"(t, S:) dS; — 3 VN (t, St) AS)e 
+X; dS; + ry: dt 
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Now, to remove the randomness from the value of the portfolio we choose 
X,; = V'(t, S;). This makes the coefficient of dW; zero and 


dU, = |=Vites.) = sv" S,)o* SP 4+rY,| dt. (9.18) 
The assumption of no arbitrage tells us that this must equal zero. 
Using the product rule (9.9) on (9.16), we see that 
dO, = Xz dS; + dY; + Sy dX; + d(X,S),. 
Hence, the self-financing condition (9.17) can be written as 
dY, =r Y; dt — S; dX; — d(X, S),. 


Since X; = V'(t, S;), It6’s formula gives 


: 1 
dX, =([V'(t, S:) + V(t, St) wS; + s(t, S;) 0? S?] dt 


+V"(t, St) O Sy dW;,. 
Hence Y; must satisfy 


dY, =rY,dt — Lac Sz) S_ + V"(t, St) (u + 0?) S? 


1 
+ 5V"(t, St) 0 s¢| dt —V"(t, S,) 0 S2 dW. (9.19) 


Let 
Y; = Vit, St) =— Sy Xt = V(t, St) aes St V'(t, St), 


and assume that V(t,x) has been chosen so the quantity in (9.18) vanishes, 
1.€., 


1 
Vit, x2) + 5 g*o* V"(t,2)+raV'(t,2) ~rV(t,z) =0. (9.20) 


Then an It6’s formula calculation shows that (9.19) holds. 

One can get lost in the calculation, so it is worth understanding why it 
works. If there are no arbitrage opportunities and the option is priced prop- 
erly, then any strategy that produces no randomness must also produce no 
gain or loss. Hence the current value of the portfolio, O:, must also be the 
price of the option at that time, i.e., V(t, S;) = O;. Since we know that we 
must have V’(t, S;) shares of the stock to hedge the option, the assets in bonds 
must be 


Y; == O; Po Xt Si — V(t, St) aa V'(t, S;) Si. 
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Plugging into (9.18) we get the Black-Scholes equation (9.20). 

Note that the Black-Scholes equation has r and o% as parameters but pu 
does not appear! The value of the option depends only on the bond rate and 
the variance parameter (sometimes called the volatility) 7. We need to find 
the solution of this equation with boundary condition V(T,x) = (x — K)4. 
The dependence on r can be removed by a simple change of variables: if V 
satisfies (9.20) with r = 0, 


1 
V(t,2) +5 ao" V"(t,2) = 0, (9.21) 


and V(t, 7) = e™-T) V(t, e"7-% 2), then V(t, z) satisfies (9.20) and V(T, x) = 
V(T,x). This can be checked by differentiation (Exercise 9.7); however, there 
is a simple reason why this is true. If money grows at rate r, then x dollars 
at time T is the equivalent of e”'—7) xz dollars at time t. Hence, it suffices to 
solve the equation when r = 0. 

A probabilistic form for the solution of (9.21) is given by the Feynman-Kac 
formula (9.15); in fact, this form can be used for options with different payoffs 
V(T,x) = g(x). Assume r = 0. Remembering that V(t, S;) = O;, we get 


dV (t, S;) = V'(t, Sz) dS. 
If V(t, S;) were a martingale, we would know that 
E[V(t, S:)] = E|V(T, Sr)| = Elg(Sr)]. 
Recall that S; satisfies 
dS; = wu S; dt +a S; dW;. 


This is a martingale only if 4 = 0. However, we have seen that the value of 
the option does not depend on the value of jz, so we can set p = 0. If p = 0 
the solution to the stochastic differential equation is 


2 
S; = exp a — oth 
Then we have 
V(T — t,x) = El[g(St) | St = 2] 
ot 
==, s (exp{o Ww, a Sy) | W, _ 
= Ig Coa ens) 


where N is a standard unit normal. 
Suppose g(y) = (y — K),. Then, 


log x 
o 


V(T —t,e? 22) = Ef(re?Vt% — K),]. 
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A straightforward, although tedious, calculation (see Exercise 9.4) shows that 
the right-hand side is 


eis foc) i a4 - to) | 


where ® denote the standard normal distribution function. Hence V(T — t, x) 
is given by 


os ——— = a ee i | 


This is the solution for r = 0, and we can easily convert it to the solution for 
general r. 


Black-Scholes Formula. Suppose V(t, x) is the solution to (9.19) satisfying 
V(T,x) =(a#—K),. Then V(T —t,2x) equals 


log(x/K)+(r+%)t nt { log(a/K) + (r—- Se 
2 (mee eat) sae g ( See ee), 


where ® is the standard normal distribution function. 


Let us generalize and assume that 5S; satisfies 
dS; = u(t, St) St dt te a(t, St) Si dW, 


where p(t,x),o(t,x) are given functions. We cannot give an explicit solu- 
tion to this stochastic differential equation. However, we can still give an 
expression for the value of a European call option. We assume that we have a 
self-financing portfolio with value O, = X; 5; + Y; that “hedges” the option. 
If V(t, x) denotes the value of the option, then we choose X; = —V’(t, S;) 
in order to remove the randomness. Assuming no arbitrage, the value of the 
portfolio using the hedging strategy is exactly the same as the value of the op- 
tion at that time. Therefore Y; = O; — X; 5; = V(t, S:) — V’(t, S_) S;. Hence, 
we again obtain the Black-Scholes equation (9.20) where o7 is replaced with 
o*(t, x). We need to find the solution to 


: 1 
V(t, x) + , o* (t,x) V" (t,x) t+raV'(t,2) —rV(t,z) =0, 


with V(T, x) = g(x). Note again that p(t, x) does not appear in the equation. 
In most cases, there is no closed form for this solution. However, the Feynman- 
Kac formula (9.15) gives the value in terms of an expectation that can be 
estimated by simulation. 
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9.9 Simulation 


Consider a stochastic differential equation 
aX, = a(X;) dt ae b(X+) dW, 


where a and 0 are relatively nice functions of x and W; denotes a standard 
Brownian motion. The solution is a process X; that at any particular time 
looks like a Brownian motion with drift parameter a(X;) and variance pa- 
rameter b(X;). While it is often difficult to give an explicit solution to the 
equation, it is easy to simulate the process on a computer using a random 
walk. 

Choose some small number At. We can approximate the Brownian motion 
by a simple random walk with time increments At and space increments V At. 
To do this let Y,, Yo,... be independent random variables with 


P{¥ = 1} =P(% =-1} = 5. 


We set Xo = 0 and for n > 0, 
Xnat = X(n-1)at + AX (n-1)at) At + O(X(n-1)at)VAt Yn. 


In practice, it is often just as easy to make the increments normal. If 2), Z,... 
are independent standard unit normals, we can set Xo = O and for n > 0, 


XnAt = X(n—1)At ot a(X(n—1)ar)At ae b(X(n—1)at) VAt Zn. 


9.10 Exercises 


9.1 Let W; be a standard one-dimensional Brownian motion with Wop = 1 
and let r be areal number. Let T be the first time that W; = 0. Let Ry = W/’. 

(a) Write the stochastic differential equation for R; (valid for t < T), i.e., 
find f,g such that 


dR, = f(R,) dt + g(R:) dW. 


(b) Find a function F’ such that Mi,r is a martingale where 


M;, = R, exp {[ F(Rs) ist | 
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9.2 Let d > 1 and let W,; denote a standard d-dimensional Brownian motion 
starting at r #4 0. Let M; = log|W;| if d = 2 and M; = |W,|?~¢ if d > 2. 
Show that M; is a martingale. 


9.3 Let W; be a standard one-dimensional Brownian motion and let a,b > 0. 
Let T,,-» be the first time ¢ such that W; = a or W; = —b. 
(a) Use the martingale W, to find P{W7r, _, = a}. 
(b) Use the martingale W? — ¢ to find E[T, _]. 
(c) Explain why the random variables T,,-, and Wr 
) 
) 


.,-q are independent. 
(d) Are the random variables T,,_, and Wr, independent for all a,b? 
(c) Use the martingale e 


tion for Ta,—a. 


—b 


AW: —(A/2)t to compute the moment generating func- 


9.4 Suppose N is a standard unit normal and X = ae?% where a,b > 0. 
Show that the density of X is 


ie ) = 5,0 (EE), 0<2r<0, 


where 6(z) = (27)~!e7*'/? is the density for N. If K > 0, show that 


/ "p= Fle) ae 
0 


ang (OL AB) _ eq (Ielel40), 


where ® denotes the distribution function for N. 


9.5 Let X1, X2,... be independent N(0,1) random variables and let f be a 
bounded continuous function. Let Zo = 0 and for n > 0, 


Zn = Zn-1 + FZ 421) + Xn. 


We will do the Girsanov transformation for Z, to make Z,, a martingale (with 
respect to F,, where F,, is the information in X1,..., Xv). 

(a) If a is a real number, compute E[X ,e**"]. (One can do it directly, or 
one can differentiate the moment generating function E [e***] with respect to 


a.) 
(b) Let Mp = 1 and for n > 0, 


2 
n = exp Le A 1) x,-> ew 


Show that M,, is a martingale with respect to F,,. 
(c) Show that M,, Z, is a martingale with respect to Fy. 
(d) Show that Z, is a P-martingale where dP = M,, dP. 
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9.6 Suppose W; is a standard one-dimensional Brownian motion. Suppose 
Zo = 1 and Z; satisfies the Bessel equation 


a 
= ; 
dZ Z, dt + dW, 


Here a is a real number and we only consider t < T = min{s: Z, = 0}. 

(a) Find a nonconstant differentiable function ¢ such that M; = ¢(Zi,a7T) 
is a martingale. (Hint: use It6’s formula to find a differential equation that ¢ 
should satisfy and then solve the equation.) 

(b) If0 <e<1<aand S = S(e,q@) denotes the first time t such that 
Z, =e or Z =a, find P{Zs = e}. 

(c) Find the probability that there exists some time t with Z = e«. For 
which values of a is this probability equal to one? 

(d) For which values of a does the process reach the origin in finite time? 


9.7 Show that if V(t, x) satisfies (9.21), then V(t, x) := e™-7) V(t, e"7-9 z) 
satisfies (9.20). 


9.8 COMPUTER SIMULATION. Assume X; is a process satisfying the 
stochastic differential equation 


aX; = a( Xz) dt 7 b(X;) dW, 


where 


2b) 
Wa) = 45 xr< 0. 


Using At = 1/100 run many simulations of X;. Estimate the following 

(a) E(X;), 

(b) P{.X; > 0} You may wish to use both +1 and normal increments and 
compare the results. 


9.9 Do Exercise 9.8 with 


Suggestions for Further Reading 


There are many possibilities for additional reading. We make a few suggestions 
here, but this is not intended to be a complete list. 


Background in probability at an undergraduate level: 


G. Grimmett and D. Stirzaker, Probability and Random Processes, Oxford 
University Press. 
J. Pitman, Probability, Springer-Verlag. 


Stochastic processes at the level of this book: 


G. Grimmett and D. Stirzaker, Probability and Random Processes, Oxford 
University Press. 

S. Karlin and H. Taylor, A First Course in Stochastic Processes and A 
Second Course in Stochastic Processes, Academic Press. 

S. Resnik, Adventures in Stochastic Processes, Birkhauser. 


To pursue stochastic processes at a higher level, it is necessary to have a 
background in advanced calculus (undergraduate real analysis) and measure 
theory. One possibility for each of these is: 


R. Strichartz, The Way of Analysis, Jones and Bartlett Mathematics. 
R. Bartle, The Elements of Integration and Lebesgue Measure, Wiley. 


The next step is to learn probability at a measure-theoretic level. ‘These books 
contain some of the measure theory as well: 


P. Billingsley, Probability and Measure, Wiley. 

R. Durrett, Probability: Theory and Examples, Thomson Brooks/Cole. 
J. Jacod & P. Protter, Probability Essentials, Springer-Verlag. 

D. Williams, Probability with Martingales, Cambridge University Press. 


For treatments of Brownian motion and stochastic calculus using measure- 
theoretic probability theory: 


K. Chung & R. Williams, An Introduction to Stochastic Integration, Birkhauser. 


R. Durrett, Stochastic Calculus: A Practical Introduction, CRC Press. 


I. Karatzas and S. Shreve, Brownian Motion and Stochastic Calculus, Springer- 


Verlag. 
B. @ksendal, Stochastic Differential Equations, Springer-Verlag. 
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78 
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Ising model 164 

Ito’s formula 205-216 


Jensen’s inequality 125 


Markov chains 
countable 43 
finite 9 
finite, continuous-time 68-74 
Monte Carlo 162-166 
reversible 155 
Markov property 1, 9, 176-177 
martingale 106, 217 
martingale betting strategy 107-108 
martingale convergence theorem 117 
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Metropolis algorithms 165 


null recurrence 51-53, 78 
optimal stopping 87-97 


optional sampling theorem 112, 115, 
217 
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Perron—Frobenius theorem 17, 40- 
Al 
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Polya’s urn 109, 116-117, 119, 122 
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quadratic variation 51-51, 207, 211 
queues 10, 44-45 
G/M/1 150-151 
M/G/1 133, 148-149 
M/M/k 75 


random harmonic series 115-116, 119 
random walk 
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graph 12, 21, 31 
partially reflecting 44, 52-53 
reflecting boundary 11, 18, 29 
simple 44, 46-49 
symmetric 12 
recurrence 00-53, 77, 119-120, 189- 
19] 
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reflection principle 122, 178 
renewal equation 138 
renewal process 131 
age 133, 138-141, 145 
central limit theorem 135 
law of large numbers 134 
lifetime 141-142, 147 
residual life 141-142, 146 
renewal theorems 136-137 
return times 25, 51, 131-132 


self-financing 224 

simple strategy 201 

state space 1 

stationary distribution, see invari- 
ant distribution 

steady-state distribution, see invari- 
ant distribution 

Stirling’s formula 47-63 

stochastic integral 199-228 

stochastic matrix 10 

stochastic process 1 

stopping time 88, 110, 177 

strong Markov property 147 

substochastic matrix 27 

submartingale 109, 123 

superharmonic function 62, 89 

supermartingale 109 


transience 50-53, 77, 119-120, 189- 
191 
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uniform integrability 114-116 
value 89 
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Wald’s equation 129, 149 
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Focusing on mathematical ideas rather than proofs, Introduction to 
Stochastic Processes, Second Edition provides quick access to 
important foundations of probability theory applicable to problems 
in many fields. Approaching all problems and theorems without any 
measure theory, the book provides a concise and informal introduction 
to stochastic processes evolving with time. 


Here’s what’s new in the Second Edition: 


e Expanded chapter on stochastic integration that introduces modern 
mathematical finance 

e Expanded discussion of It6’s formula including Girsanov theory, 
the Feynman-Kac formula, and the Black-Scholes formula in 
stochastic integration 

e New topics such as Doob’s maximal inequality and a discussion 
on self similarity in the chapter on Brownian motion 


This concise, informal introduction is designed to meet the needs of 
students and professionals not only in mathematics and statistics, but 
in the many fields in which the concepts presented are also important, 
including computer science, economics, business, biological sciences, 
psychology, and engineering. It acquaints readers with the possibilities 
of applying stochastic processes in their work. 
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